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Editorial Note 


The last twenty or thirty years have seen a remarkable increase in the amount of 
research devoted to fundamental problems in language and speech. This work has 
sprung not only from the practical needs of communication engineering but also from 
a realization which has spread ever more rapidly that speech is an aspect of human 
behaviour of the utmost importance for the study of mankind. Because language 
and speech function at many different levels, psychological, physiological and physical, 
in the individual and in human society, they offer a field for research to a great 
diversity of specialists. 

Developments in electro-acoustic technique have been largely responsible for a 
rapid increase in our knowledge of speech at the physical level. New ways of 
inspecting and describing the wave-motions of speech, new means of transmitting 
and recording speech sounds have all added to our knowledge and have, at the same 
time, raised new problems. Among other things, these advances have served to 
stress the importance of the human terminals in any communication chain and one 
result has been a large body of experimental work devoted to the perception and 
recognition of speech. A further consequence has been a growth of interest in 
language and its organization and particularly in language statistics which have 
assumed great importance in view of the formulations and the extensive use of 


information theory. 
In the realm of physiology and psychology, fresh interest in this field has arisen from 
the study of disturbances of language and speech, at the neurological level, and also 


from a growing awareness that speech, if examined in the right way, may serve as 
a reliable index of mental states. The importance of these aspects of the subject has 
long been realized by neurologists and psychologists but fresh impetus has been given 
to this type of research by the development and exploitation of new experimental 
methods in recent years. 

The accounts of such varied research projects have naturally appeared in a large 
number of specialist journals, engineering, physical, medical and psychological, which 
have generously devoted space to subjects which were often, perhaps, on the periphery 
of their area of interest. This has had the advantage of bringing to the notice of a 
very wide range of readers the research being carried out in this field and has itself 
been partly responsible for the growth of the view that the study of language and 
speech may properly be considered as constituting a separate discipline. 

The purpose of this journal is to furnish a medium for publication in which the 
fundamental problems of ianguage and speech form the centre of interest. One of 
its most important functions will be to offer space, not only for reports of experimental 
work, but for the extended treatment of conclusions which may be drawn from such 
work and for the development of theories which represent a synthesis of knowledge 
derived from experiment. It will attempt to deal with all aspects of the subject and 
it is therefore not possible to give an exhaustive list of the subjects that will be 


(i) 








covered ; among them will be language structure, psychology of language and speech, 
transmission and reception of speech, mechanical translation, mechanical speech 
recognition and synthesis, language statistics, abnormalities of language and speech. 

It is hoped that contributions will be made by a wide variety of research workers — 
linguists, philosophers, logicians, psychologists, physiologists, physicists, engineers, 
statisticians — and that the diverse material published may be unified in some degree 
by its relation to the central theme and purpose of the journal : an understanding of 
the fundamental problems of language and speech. 


(ii) 
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CUES FOR THE DISCRIMINATION OF AMERICAN ENGLISH 
FRICATIVES IN SPOKEN SYLLABLES* 


KATHERINE SAFFORD HARRIS 
Haskins Laboratories, New York 


It has sometimes been assumed that the identification of the fricatives of American 
English in CV syllables depends primarily on the characteristics of the noise (i.c., 
nonvocalic) portion of the speech sound. A second possibility is that characteristics 
of the vocalic portion—previously shown to be cues for the perception of other 
consonants—are important for the fricatives. These alternatives were tested by com- 
bining the noise from one spoken fricative-vowel syllable with the voiced portion of 
another. Results indicate that the important cues for the fricatives /s/ and /{/ are 
given by the noise but that the differentiation of /f/ and /@/ is accomplished primarily 
on the basis of cues contained in the vocalic part of the syllable. Similar results were 
obtained for the voiced counterparts of these sounds, 


This paper will be concerned with the first steps in an investigation of the cues 
which listeners use in discriminating among the members of the class of unvoiced 
fricatives /f/, /8/, /s/ and /S/, and among their voiced counterparts, /v/, /5/, /z/, 
and /3/. 

If one examines spectrograms of these sounds in consonant-vowel syllables, he finds 
them to be made up of two successive segments—a period of noise, which we shall 
call the friction, succeeded by a segment with well-marked formant structure, which 
we shall call the vocalic portion.’ Cues for identifying the phonemes might well be in 
either or both of the two portions. Indeed, one might infer from research with other 
groups of consonants that both parts are important. For example, the friction of the 
fricatives is much like the burst of the stop consonants ; it has previously been found 
that the frequency position of this burst is significant in determining which stop will 
be perceived (Liberman, 1952). Transitions of the second and third formant in the 
vocalic part of the syllable have been shown to be important for distinguishing among 
the liquids and semi-vowels (O’Connor, 1957), among the nasals, and among the stops 
(Liberman, 1954). 

As a step towards isolating the acoustic cues for the fricatives, then, it has seemed 
reasonable in this experiment to assess the overall relative contribution of the friction 
and vocalic portions of fricative-vowel syllables. Detailed examination of particular 
cues within each of the two parts will be left for later study. 


*This work was supported in part by the Carnegie Corporation of New York and in part by 
the Department of Defense in connection with Contract DA 49-170-sc-2159. Some of the 
results were reported at a meeting of the Acoustical Society of America in New York in 
Fune 1954. 

These characteristics of spectrograms of fricatives have been noted in two previous studies. 
Potter, Kopp and Green (1947) have called the two parts of fricatives described above the ‘fill’ 
and the ‘consonant influence on the vowel’, while Foos (1948) has called them the ‘noise patch’ 
and the ‘glide’ or ‘transition’. 
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PROCEDURE 


A simple means of assessing the relative importance of cues in the friction and 
vocalic portions would be to split the friction segment away from the vocalic segment 
of the syllable, and then recombine friction and vocalic portions from different 
syllables. Presumably, the more important part of the sound would determine which 
phoneme a listener would hear. 
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Fig. 1. Test stimuli generated from the spoken syllables /fi/, /@i/, /si/, and /{i/. The dotted 
line on the schematic spectrogram at the top of the figure indicates the point in the recorded 
sound at which the magnetic tapes were cut, Each of the resulting four types of friction was 
combined with each of the four types of vocalic portion to make 16 new combination stimuli, 
as indicated in the lower half of the figure. 


A similar technique of recombination has been used by Schatz (1954) in a study of 
changes in stop consonants from one vowel to another. In her study, magnetic tape 
recordings of the stop consonants were made, and then the burst of noise at the 
beginning of the stop was split away from the vocalic portion, and interchanged with 
the burst from another stop consonant-vowel syllable. In the present experiment, 
we have used the same technique to interchange friction and vocalic parts of different 
fricative-vowel syllables. 

The first step was to make tape recordings of a number of repetitions of each of 
the four syllables /fi/, /@i/, /si/, and /\i/, spoken by a single male speaker. Since 
we wanted to split the syllables between friction and vocalic portions, we needed to 
put a marker on the magnetic tape at the join of the two. To do this, the tape was run 
back and forth over the playback head of a tape recorder by hand, and the output 
monitored by listening and by watching the face of an oscilloscope ; the join of the 
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friction and vocalic parts of the syllable could be seen and heard by the change from 
low-intensity, high-frequency noise to high-intensity, low-frequency periodic sound 
waves. Through the use of this method, all the tapes were marked and cut at the 
marked points. A friction portion from each of the four syllable types was combined 
with a vocalic portion from each, to make 16 combination stimuli. The recombinations 
produced are schematized in Fig. 1. 


A similar set of 4 repetitions of each of the four fricatives before each of the vowels 
/e/, /o/, and /u/ was recorded, and each set was recombined to make 16 stimuli for 
each vowel, analogous to the combinations for the vowel /i/ designated in the 
preceding paragraph. All 64 stimuli, 16 for each of the four vowels, were then 
re-recorded. 


The naturalness of the recombined syllables was dependent on the accuracy of 
the location of the original cut between the friction and the vocalic parts of the syllable. 
This location was checked after the rejoining operation and the re-recording by turning 
the original tape to the oxide side and running a magnetized knife blade along the 
splices between friction and vocalic parts of each syllable, so that there was a sharp 
click placed at the join of the two. When spectrograms were made, the click appeared 
as a black line at the point of join. One could then tell by visual examination whether 
or not the cut had been made at the point intended. (Of course, it was necessary to 
use the re-recording made before the insertion of the test clicks for the final test tape.) 


After all the rejoined syllables had been checked in the manner described above, 
the re-recording was made into a test tape by rearranging all the stimuli in random 
order, and spacing them in such a way that each syllable was repeated once after an 
interval of 0-9 sec., and successive pairs of syllables appeared 6 sec. apart. This 
recording of the 64 stimuli will be referred to below as the unvoiced fricatives test. 


A similar test was made for the four voiced fricatives. The male speaker who 
recorded the unvoiced fricatives recorded /v/, /5/, /z/ and /3/, before the same 
set of four vowels, /i/, /e/, /o/ and /u/. The syllables were recombined as before 
and checked for splice position. We should note, however, that the spectrographic 
checking technique used is somewhat less accurate in the case of a voiced fricative, 
since the boundary between friction and vocalic portions is harder to define from a 
spectrogram than the boundary for an unvoiced fricative. The 64 stimuli were made 
into a test in the manner described above ; this recording will be referred to below as 
the voiced fricatives test. 


The voiced and unvoiced fricatives tests were presented to 22 listeners, volunteers 
from undergraduate and graduate courses at the University of Connecticut. The 
subjects were present for two sessions, each of which contained one presentation of 
each test. Within the session, half the subjects heard the unvoiced fricatives test first, 
while half heard the voiced fricatives test first. All subjects were instructed to judge 
the stimuli as /f/, /0/, /s/ or /§/, for the former test, or as /v/, /5/, /z/ or 
/3/, for the latter. 
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We thought that the judgments made might be affected by the intensity level of 
the tests, since the friction portions of the different’ phonemes vary considerably in 
intensity ; to control for this possibility we presented the stimuli at two different 
intensity levels in the two experimental periods. These intensity levels cannot be 
specified meaningfully in the usual units, because the listeners were seated in an 
experimental room listening to a loudspeaker, rather than listening on earphones. The 
two levels selected were, first, the lowest speaker output level at which the experi- 
menter could hear the friction when seated in the subject’s chair farthest from the 
loudspeaker, and, second, the highest level at which the sound could be presented 
without distortion. Analysis of the data has shown that this variable had no effect on 
the listeners’ judgments over the range selected ; therefore, results are not presented 
separately for the two intensities. 
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Fig. 2. Responses to stimuli made up of friction from one unvoiced fricative-vowel syllable 

paired with the vocalic portion of another. Within each small rectangle, the height of each of 

the four bars of the histogram indicates the per cent judgments of each of the four phonemes 

/f/, /8/, /s/, and /§/. The data were obtained from 22 subjects, each of whom heard each 
syllable twice. 


Bow 
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RESULTS 


The results for the unvoiced fricatives test are shown in Fig. 2. Each histogram 
represents the responses of the subjects to one of the 64 stimuli of the experiment. 
For example, the upper left-hand histogram in the upper left-hand quadrant indicates 
the responses of the subjects to a stimulus made up of /f/ friction and /f/ vocalic 
portion, with the vowel /i/. Approximately 95% of the subjects identified the 
resulting syllable as /f/, while 5% heard the syllable as /6/. 


Before describing the results in detail, we should note that the results are the same 
for the four vowels. This can be seen by examining the corresponding histograms in 
each of the four quadrants, which represent the four vowels. We will therefore 
describe the data hereafter without reference to specific vowels. 


As can be seen in Fig. 2, the results of the test were quite different for /s/ and 
/S/, on the one hand, and /f/ and /6/, on the other. When/s/ friction was paired 
with any vocalic portion, the resulting stimulus was judged as /s/ ; similarly, when 
/S/ friction was paired with any vocalic portion, the resulting stimulus was judged 
as /\/. None of the other eight stimuli was judged as /s/ and /{§/ with any great 
frequency. Apparently, then, the friction of /s/ and /§/ provide the necessary and 
sufficient cues for their identification, and override whatever cues may be provided 
by the vocalic portions. 


The situation was somewhat more complicated for /f/ and /6/ judgments. In 
general, only stimuli with /f/ or /6/ friction were judged as either /f/ or /6/, but 
which of the two judgments was made depended largely on the vocalic part of the 
syllable. Most of the listeners tended to judge a syllable with /f-6/ friction as /f/ 
when it had /f/ vocalic portion, and as /6/ when it had any other vocalic portion.’ 


The results of the voiced fricatives test, shown in Fig. 3, were similar 
to those just described, though not as clear. The phonemes /z/ and 
/3/ behaved like their unvoiced counterparts /s/ and /§/ for all vowels, in that both 
/z/ and /3/ were identified almost entirely by their friction portions. The results 
for /v/ and /3/, on the other hand, were variable from vowel to vowel. When the 
vowel was /o/ or /u/, syllables with /v/ friction and /3/ friction were identified as 
/v/ or /3/ depending on the vocalic portion ; in other words, with these vowels the 
sounds behaved in the same way as /f/ and /6/. For /i/ and /e/, however, there 


? Hughes and Halle (1956) have recently reported some experimental results which are in 
general agreement with the results of the unvoiced fricatives test. In their study, several 
speakers produced syllables containing /f/, /s/ and /§/ with various vowels. The isolated 
friction segments of the syllables were then presented to listeners who were asked which of the 
three phonemes had been spoken. All three were identified quite well. The result is not 
surprising for /s/ and /§/, since we had concluded that friction provided the necessary and 
sufficient cues for their identification. Furthermore, we would expect that /f/ friction would 
be discriminable in the set of alternatives presented, since in our experiment, /f/ friction was 
not confused with any friction except /8/, and /Q/ was not a possible response in Hughes’ and 
Halle’s experiment. 
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Fig. 3. Responses to stimuli made up of friction from one voiced fricative-vowel syllable paired 
with the vocalic portion of another. The data are displayed as in Fig. 2. 


appeared to be some contribution from both friction and vocalic portions. In this 
connection, it should be remembered that, as we noted above, the join of the friction 
and vocalic portions is less clear for the voiced fricatives ; consequently, the original 
splicing may not have been made as well and we might expect somewhat more 
variability in the results. 


By way of summary, then, we may say that the results suggest a general way of 
describing the perception of the four fricatives (with their voiced counterparts). The 
listener may be said to behave as if he first decided on the basis of friction, whether 
the syllable belonged to the /s-§/ class or to the /f-6/ class. If /s/ or /§/ he uses 
the friction again to decide which of these alternatives it was. If, on the other hand, 
the first decision had been that the sound belonged in the /f-0/ class, then the 
listener uses the vocalic portion to decide which of the two sounds, /f/ or 
/8/, he had heard. 
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THE RELATION BETWEEN THE FUNCTIONAL BURDENING 
OF PHONEMES AND THE FREQUENCY OF OCCURRENCE 


G. HERDAN 


University of Bristol 


The frequency of occurrence of phonemes in a language may be derived from 
dictionary material or from continuous texts. This paper deals with the relation 
between the two sets of values for English. When distributions are plotted for English 
phonemes, classified according to manner and place of articulation, it is seen that 
there is a close similarity between the distribution for dictionary material and for 
continuous texts. The hypothesis is advanced and tested that the phoneme distribution 
in speech is a random sample of the phoneme distribution in dictionary material (the 
functional burdening of phonemes). 


The distinction between the two types of relative frequency of phonemes, viz. as 
the ratio of occurrence numbers to text length (in terms of total phoneme number) and 
as the ratio of their incidence in the dictionary to the total of phonemes there, is not 
new. It was probably stressed for the first time by Trubetzkoy in ‘ Grundziige der 
Phonologie’ where he speaks of the ‘doppelte Relativitat’ in phoneme counts. 
However, the relation between the two distributions of relative frequencies has so 
far not been reliably established. Consequently, what we read about it is mere 
guesswork, based upon subjective impression — just the thing we want to avoid in 
quantitative linguistics as an empirical science. As a rule, the two distributions are 
regarded as basically different ; even so unsurpassed an expert in matters of phonology 
as Trubetzkoy seems to have taken this for granted, since otherwise he would not have 
used the term ‘ double relativity ’ (Trubetzkoy, 1939). 

The purpose of this paper is to study the relation in question using the methods of 
quantitative linguistics which are now available to us. 

A very detailed investigation of English mono- and dissyllabic words in the Oxford 
Dictionary was carried out by Trnka in his book ‘ A Phonological Analysis of Present- 
Day Standard English’ (1935). That it is limited to mono- and dissyllables does 
not materially detract from the generality of its results, because words of more 
syllables (many of which are of foreign origin) are relatively infrequent in both the - 
spoken and the written language : according to Dewey (1923), their frequency of 
occurrence does not exceed 8%. We shall return to this point when the question 
arises of what this means in terms of dictionary incidence. 

On the basis of Trnka’s work a detailed quantitative analysis of phonemes in English 
was carried out by Kramsky, who grouped the English phonemes of the dictionary 
material according to the manner of articulation and according to the place of articula- 
tion or the organ of production. According to the manner of articulation he divides 
the 23 English consonant phonemes into plosives and affricatives : /b d g p t k tS d3/, 
fricatives: /6 5 f v s z § 3 and h/, and nasals and liquids: /l1 mn yn r w j/. 
According to the place of production they are divided into labials: /p b m w f v/, 


-.” articulation 





Manner of (Plosives & affricates 
Fricatives 
Nasals & liquids 


iad Total 


Labials 
Dentals 
Palatals 
Velars 


Place of 
articulation 


Total 


TABLE 1 


Monosyllables 
3122 (40-8%) 
1765 (23-1%) 
2757 (36-1%) 


Dissyllables 
2579 (41-4%) 
1234 (19-9%) 
2405 (38-7%) 


Mono- & dissyllables 
5701 (41-1%) 
2999 (21-6%) 
$162 (37-3%) 





7644 (100-:0%) 


1849 (24-2%) 
4525 (59-2%) 

106 (1-4%) 
1164 (15-2%) 


6218 (100-0%) 


1528 (246%) 
3771 (60-6%) 
101 = (1-7%) 
818 (13-1%) 


13862 (100-0%) 


3377 (24-4%) 
8296 (59-8%) 
207) = (15%) 


1982 (14-3%) 





7644 (100-0%) 


6218 (100-0%) 


13862 (100-0%) 


English Consonant Phonemes — Dictionary Count. 


dentals and alveolars: /6 3tdn1sz{3r/, palatals: /j/, velars: /k g )/ and 
glottal : /h/. The results may be summarised as in Table 1 (Kramsky, 1946). 

We note that there is a considerable similarity in the phoneme distribution, both 
according to the way and according to the organ of production, for monosyllables 
and dissyllables. Incidentally, this explains why the quantitative text characteristic 
of word length in terms of syllable number which, as Trubetzkoy has already pointed 
out, is partly dependent on individual style, does not interfere with the far-reaching 
similarity of the phoneme distributions in texts from widely different writers. 

The similarity of the two phoneme distributions in mono- and dissyllables, especially 
according to the place of production, is such that we may regard them as random 
samples from one statistical population. We have therefore added to Kramsky’s table 
the distribution in both mono- and dissyllables taken together, which we regard as a 
series of population frequencies or probabilities. 

Corresponding figures for phoneme frequencies according to place of production 
in running texts are available from quite a number of investigations. We take as a 
representative sample the figures from an investigation by Fowler (1957). Four 
samples were examined, “Il chosen from modern English. Each sample was tran- 
scribed into a phonemic alphabet of 24 consonants and 6 vowels. Three of the samples 
consisted of 5000 consecutive phonemes each ; the fourth, a story for children, is 
complete in 501 phonemes. The samples of 5000 were each divided into 5 samples 
of 1000 each, and the mean, standard deviation and coefficient of vari ‘on of the 
sampling distribution of means, v», were calculated for both the total and the partial 
distributions. The results show that the type-token relation of phonemes is not 
significantly disturbed by individual differences in style (author or subject). 

The advantage in using Fowler’s work is that each text sample of 5000 phonemes 
is divided into 5 samples of 1000 phonemes each, which enables us to test the stability 
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of phoneme distribution within and between samples. Grouping the consonant 
phonemes in the running text samples in the same way as those of the dictionary 
material have been grouped, we get the distributions shown in Table 2. Dewey’s 
data, shown in the last column of Table 2, give a very similar distribution. 


TABLE 2 
Concepts of the Calculus Fielding & Sterne Averageof Dewey 
by C. B. Boyer. by G. Greene. two preceding 
columns. 
2nd 1000 4th 1000 1st 5000 Ist 5000 
phonemes phonemes phonemes phonemes 


Labials 144 (23-2%) 135 (22-2%) 694 (22:7%) 750 (242%) 722 (235%)  21-:3% 
Dentals 407 (655%) 403 (664%) 1996 (65-4%) 2002 (645%) 1999 (64:9%) 67:3% 
Palatals 6 (10%) 6 (10%) 30 (10%) 31 (10%) 30-5 (10%) 1-0% 
Velars 64 (10-3%) 63 (10-4%) 334 (109%) 321 (10-3%) 327-5 (106%)  10-4% 





Total 621 (100%) 607 (100%) 3054 (100%) 3104 (100%) 3079 (100%) 100% 


English consonant phonemes — running text count. 


The far-reaching similarity between the distribution of consonant phonemes in the 
1000 and 5000 phoneme samples from the same text, between the 5000 phoneme 
samples from different texts and between these and the very great sample from Dewey 
(22,477 consonant phonemes) is not surprising. The implied stability of phoneme 
occurrence in running texts has been established before (Herdan, 1956). What is 
remarkable is the close similarity between the phoneme distribution for dictionary 
material and for running texts, which can be seen by comparing Tables 1 and 2. 
However, it would appear that when used in speech, dentals are in excess, and velars 
in defect, of the corresponding dictionary frequencies. This could mean that there 
are two factors involved in the stability of relative frequency of phonemes in use : 
the pattern provided by the functional burdening of phonemes (in the dictionary) and, 
though to a much lesser extent, the need for divergence from it. 


To test this assumption, we shall first try to derive the frequencies of phonemes in 
speech by a stochastic process from the pattern of functional burdening in the 
dictionary, and see what conclusions can be drawn from such numerical agreement 
as we obtain between the two series — text occurrence and dictionary incidence, or 
frequency of use of phonemes in speech and their functional burdening in ‘ la langue ’. 

Incidentally, the similarity between the two distributions admits the conclusion that 
the words of three or more syllables in the dictionary will not exceed 8%, the value 
for the frequency of occurrence of such words in the text, which enables us to assess 
the error we are making by working with mono- and dissyllables only. 


The stability of the distribution of frequency of use, its independence of the type of 
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text, points to a general function according to which phonemes are produced in speech. 
Insofar as the phoneme distribution from texts is similar to that from the dictionary, it 
might well be considered to be a random sample of the latter, and a law of chance 
might present such a general function. We shall therefore use the Poisson law of rare 
events, which is most likely to fit the data in question if they are governed at all by 
chance, and calculate the numbers of phonemes belonging to the different categories 
according to place of production (Yule, 1944). If the calculated numbers sensibly 
agree with those actually present in the sample, and if the proportions are also like 
those observed, we would conclude that the frequencies of use of phonemes may be 
regarded as random samples of their functional burdening, and the distributions of 
the former as random samples of the distribution of the latter. 

For each of the four phoneme groups the number, n, of consonant phonemes to 
be expected in a text of specified length, N, is calculated according to the Poisson law 
of rare events as 

n = L(1 — e~Noe/L) 
where L equals the number of the phoneme group in question from Table 1, column 
3, and p equals the probability of such phonemes from Table 2, column 5 (average). 
More: precisely, L is the number of occasions on which a phoneme, belonging to a 
specified category, appears in phonological opposition, disregarding the possibility 
that in a number of cases the opposition may not be phonologically relevant 
(Trubetzkoy’s ‘ Aufhebungsstellung ’). 

The argument implied in the above formula is as follows : according to the law of 
rare events, the probability of 1 of L phonological oppositions not occurring in a 
sample of one phoneme is e~!/“ ; the probability of 1 of L phonological oppositions 
not occurring in a proportion Np of a text of length N is e~'?/4, and consequently 
the probability of its occurrence is 1 — e~N?//, The probable number of phonological 
oppositions of the group in a text of length N is then L (1 — e—N?/L), 


The numerical data which form the basis of the calculations are listed in Table 3. 


TABLE 3 


Occurrence 
Vocabulary Probability 
(Table 1, col. 3) (Table 2, col. 5) 
} 





P 
Labials 3377 -235 
Dentals 8296 -649 
Palatals 207 -010 
Velars 1982 -106 
Total 13862 1-000 


We see from Table 4 that the sample occurrence of 1000 consonant phonemes is 
accounted for as to 96% and that of 5000 as to 84% by the theoretically expected 
number of phonemes, using a law of chance. This means that the number of different 
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TABLE 4 
N = 1000 N = 5000 
Labials 228-3 (23-6%) 992-8 (23-6%) 
Dentals 622-0 (64-3%) 2687-9 (64-1%) 
Palatals 9-7 (1-0%) 44-5 (1-1%) 
Velars 107-0 (11-1%) 565-8 (11-1%) 
Total 967-0 (100%) 4191 (100%) 


Calculated frequencies for samples of different size. 


consonantal phonological oppositions in the text samples are, to that extent, accounted 
for by those in the dictionary, representing the functional burdening of phonemes. The 
remaining 4% and 16% are repetitions. Moreover, there is a striking agreement 
between the actually observed relative frequencies in running texts and those 
calculated from the dictionary probabilities, using a law of chance, in the proportion 
of phonemes accounted for by the four phoneme groups (cf. Table 2, col. 5 and 


Table 4). 


Functional burdening thus appears as the dominant factor in the use of phonemes in 
speech, and the deviation of the frequencies of certain phoneme groups in speech 
from those in the dictionary (dentals and velars) is not an argument against the 
derivation of the former by a chance mechanism from the latter. The mutual relation 
between phonemes as regards functional burdening determines that of the categories 
of phonemes in speech output. This is what gives a language its specific character 
and makes us at once appraise the difference between two languages when heard 
spoken, even though we may not understand a word of either. The stability of 
distribution of functional burdening among the phonemes of a language, and the 
characteristic differences in this respect between nations is reminiscent of other racial 
characteristics, e.g. blood-group distributions. For purposes of comparison, Table 5 
gives phonemic distributions for five languages (Kramsky, 1946). 


TABLE 5 

Engl. Ital. Pers. Turk. Hung. 

% % % % % 
ee Plosives 41-1 42-36 29-8 36-19 39-23 
cotiadeaion Fricatives 21-6 16-74 39-1 31-47 25-05 
Nasals 37-3 40-90 31-1 32-34 35-72 
Labials 24-4 20-67 20-26 18-05 17-50 
Place of |Dentals 59-8 62-84 48-37 60-20 57-64 
articulation ) Palatals 1-5 6-25 10-59 4-32 9-19 
Velars 14-3 9.97 20-78 17-43 15-69 


Our results admit of the conclusion that the distribution of phonemes according to 
mode and place of production when used in speech can be regarded as the phenotype 
of the functional burdening, and with it of the phonemic structure as the genotype 
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of the language, in the same way as other parts of the phenotype of the human species 
can be regarded as manifestations of the genotype. In other words, the functional 
burdening of a phoneme in speech is closely similar to its functional burdening in 
the dictionary or in ‘la langue’. This means that Trubetzkoy’s ‘ double relativity ’ is 
an illusion in this case : the functional burdening of a phoneme in use being only a 
random sample of that in the dictionary. 
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BRAIN DISORDERS AND LANGUAGE ANALYSIS 


A. R. Luria 
(University of Moscow and Academy of Pedagogical Sciences) 


Separate regions of the cortex form complicated systems for the analysis and 
synthesis of visual, auditory, kinaesthetic and motor stimuli. Focal iesions of the 
brain produce a break-down in such analysis and synthesis and lead to a secondary 
disturbance of a whole complex of functions. 

Lesions in the left temporal zone cause a break-down in the discrimination and 
generalisation of sound patterns, and above all in phonemic auditory perception, This 
in turn affects pronunciation and writing, and also the structure of word-meanings. 
This failure at the lexical level affects the various parts of a word in different degrees: 
the roots of words, which carry the more concrete meaning, are lost, while suffixes, 
with their more abstract meaning, are retained. 

Clinical observation of different cases throws light on the distinction between 
the “communication of events” and the “communication of relations”. Lesions 
in the parietal and parieto-occipital areas produce a failure in dealing with “com- 
munications of relations ”, a failure to combine a number of elements into a single 
whole. When these lesions extend to the borders of the speech area, they entai! a 
further break-down in operations that require the abstraction of a scheme of reference. 
“ Communication of relations ” necessitates precisely this operation and this explains 
why patients with parieto-occipital lesions are often unable to deal with such 


communications. 

Lesions in the fronto-temporal areas affect the synthesis of successive elements and 
hence lead to a failure in “ propositionizing ”. 

In all cases of parieto-occipital lesion the “regulating” function of speech 
remains intact. Animal experiments have shown that ablation of the frontal sections 
of the cortex completely upsets the regulation of the animal’s behaviour. In man, 
frontal lesions interfere with the regulating function of speech without affecting other 
speech functions. 


Observation of aphasia—and more broadly of brain disorders of various kinds— 
has always provided invaluable material for analysis of the structure of human speech, 
and hence for a better understanding of some aspects of language structure. What 
fis an indivisible whole in a normal person, and therefore difficult to analyse, in a 
pathological condition is split up and becomes accessible to analytical investigation, 
which is always concerned with the isolation of the basic components of a complex 
whole. The task of isolating such components in human speech and studying the 
complicated functional systems that directly depend on these components is the 
immediate purpose of utilizing pathological conditions of the brain for psychological 
and linguistic analysis. 
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CORTICAL SYSTEMS AND SPEECH ANALYSIS 


Pathological conditions of the brain, and above all those which occur as a result 
of limited local affections, never produce a direct disintegration of the complex 
formation of language—morphology, syntax, lexicology or semantics. The time when 
research workers tried to find in the cerebral cortex special “centres” for under- 
standing or pronunciation of words, for morphological, syntactic or semantic forma- 
tions as well as for writing, reading and counting, has irrevocably gone. Attempts to 
correlate directly the localised affections of the brain with various aspects of language, 
as was done by Head, and attempts to describe special forms of verbal, nominal, 
syntactic and semantic aphasia can have no other value than as the initial summary 
of clinical observations ; to this must be added an understanding of those dynamic 
processes that have led to the emergence of the syndrome. 


At present, thanks to the results achieved by normal morpho-physiology, by the 
clinical study of focal lesions and by the combined efforts of physiologists and 
psychologists in the study of syndromes which arise in localised affections of the brain, 
the position has radically changed ; research workers can now formulate certain basic 
theses which make the data of brain pathology incomparably more fruitful for the 
analysis of normal functions. 


It has been established by modern morpho-physiology that separate regions of the 
cerebral cortex form the most complicated apparatus for specific forms of analysis 
and synthesis of “ exteroceptive ” or “ proprioceptive ” stimuli ; Pavlov quite justifiably 
called these the “cortical analyser terminals”. Under normal circumstances the 
function of this apparatus proceeds with sufficient force, balance and mobility of the 
nervous processes and it guarantees the finest analysis and synthesis of visual, auditory, 
kinaesthetic or motor stimuli. On the other hand, destruction of the brain tissue 
within the limits of one section or another, or such changes as can be observed in a 
limited break-down in the dynamics of the blood supply or of the cerebro-spinal fluid, 
alter the conditions of its functioning and reduce the strength of the nervous processes 
within the limits of the given functional system ; the balance of stimulatory and 
inhibitory processes suffers and their normal mobility is replaced by pathological inertia. 


Naturally, under any of these conditions normal function of the given “ analyser ” 
is disturbed, the ability to differentiate stimuli in adjacent systems loses its precision, 
and the whole process of analysis and synthesis within the limits of the modality with 
which the particular cortical system is concerned, acquires an imperfect and sometimes 
even a pathological character. This disturbance of normal processes of analysis and 
synthesis within the limits of a given functional system (visual, auditory, kinaesthetic 
or motor) is @ direct primary result of every focal lesion of the cortex. 








Qa 
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However, this immediate, primary effect of focal lesions never remains an isolated one. 
The real units of brain function are those most complicated systems of temporal con- 
nections which lead to the well-known forms of adaptive activity and which in a human 
being assume the form of objective activity—active or passive speech, writing or 
reading, counting or the solution of cognitive problems. Every one of the very com- 
plicated functional systems that have been formed in the history of social development 
includes in its composition. many functional components and can normally be realized 
only in the presence of a number of physiological conditions. Naturally, in order to 
distinguish speech sounds a precise acoustico-articulatory analysis is necessary, just as 
the articulation of any word requires the conservation of those kinaesthetic impulses 
that alone make possible the distinction between similar articulations. It is perfectly 
clear therefore that when the action of one analyser which takes part in the realization 
of any of these activities is broken down, then normal existence of the functional system 


as a whole becomes impossible, and there is a selective pathological effect on all that - 


most closely depends on normal action of a given analyser. This break-down, which 
occurs as the result of a primary disturbance of the action of one of the analysers, is 
the secondary or systemic effect of the given lesion. These secondary or systemic effects 
of the partial affection also constitute the basic content of the pathology of localized 
disturbances of cortical function. These effects include the symptoms of aphasia 
agnosia or apraxia, the study of which for analysing the structure of the speech 
processes is the main subject of this article. 


It would be wrong, however, to suppose that the complex speech system becomes 
equally disintegrated with the primary break-down of different components. The 
various components of speech activity do not occupy an equal place in the complicated 
functional system of speech ; auditory analysis and synthesis, the kinaesthetic differenti- 
ations required for precise articulation, analysis and synthesis of space and time relations 
—all this is necessary, but in different degrees, for the hearing and understanding of 
speech, for the active pronunciation of sound or words, for writing or reading. It is 
quite natural therefore that a defect in the action of one analyser or another, arising as 
a result of a focal lesion of the brain, inevitably leads to a secondary disturbance of a 
whole complex of functions, normal realization of which depends on its conservation ; 
therefore the disintegration of the speech system, which arises as a result of the 
primary disturbance uf each of these physiological functions, will vary in character. 
Just as an experienced cardiologist can, according to the character of the disturbance 
in heart function, draw conclusions about the place in which the damage has occurred, 
so an experier.ced neurologist who is studying the character of speech disorders can 
with confidence deduce which of the primary defects lies at its base. And it is 
precisely because of this that the pathological method is so valuable for the discovery 
of those hidden peculiarities of the structure of speech and of the mechanisms on 
which language is based, and the analysis of which has inevitably escaped objective 
investigation. 
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TEMPORAL ZONE LESIONS AND DISINTEGRATION OF PHONEMIC ORGANIZATION OF SPEECH 


Clinicians are well acquainted with cases where an affection of the postero-superior 
sections of the left temporal zone (in the right-handed patient), the so-called Wernicke 
area, leads to a break-down in the comprehension of speech ; to the patient, words 
begin to sound like inarticulate noises and their meaning is no longer taken in. Such 
break-downs, as the outstanding linguist, Jakobson, has so aptly remarked, lead to 
disturbances of the elementary sound code of speech, and at the same time lead to 
a break-down in the normal designation of objects, in the process of writing and to 
peculiar disturbances in thinking. These disturbances of speech are widely known 
under the name of “sensory” aphasia, but the mechanisms which lead to this 
condition continue to provoke lively arguments. It has remained obscure precisely 
which of the primary break-downs lead to such disturbances of speech, whether it is 
a “defect of hearing” or a “defect of thinking”, and it is difficult to explain the 
remarkable diversity of speech manifestations that are lost in such cases. 


Analysis of a considerable number of cases of left temporal lesions, carried out by 
Soviet clinicians, physiologists and psychologists, during the last decades, offers a 
closer approach to the solution of some of these problems. Investigations have shown 
that a lesion of the area in which we are interested, which is a part of the cortical end 
of the acoustic analyser system, does not produce (as was thought in classical 
neurology) any loss of hearing for any part of the frequency range, but inevitably leads 
to damage in the process of differentiation and generalization of sounds, in other 
words, in the processes of sound analysis and synthesis. This is evident from the fact 
that such patients easily form conditioned reflexes in response to sounds, but 
experience considerable difficulty when they are faced with the problem of differenti- 
ating complex groups of sounds, and these difficulties are very great both in the 
attempt to form differentiated reactions to chords which differ from each other by the 
presence of different components, and especially in attempts to work out a distinction 
between two series, in which sounds are arranged in different sequence (ABCD and 
ACDB) (Traugott, 1956, Babenkova, 1954, Kabelyanskaya, 1955, and others). It is 
characteristic that a corresponding differentiation of visual complexes remains in these 
cases relatively intact. 


It seems, however, the most essential point is that this break-down of sound analysis 
and synthesis does not remain within the limits of elementary sound complexes, but 
is seen particularly clearly in the differentiation of similar phonemes ; a break-down 
in phonemic auditory perception may be recognized with every justification as a basic 
symptom of a lesion in the region we are interested in. 


As a result of work, the foundations of which were laid in the modern phonetic 
theory of Baudouin de Courtenay (1882), Shcherba (1912), Trubetzkoy (1939) and 
Jakobson (1956). it has become clear that the phonetic structure of language depends 
on a system of sound oppositions, which are variously arranged in different languages, 








18 Brain Disorders and Language Analysis 


but in which invariably one of the phonetic features (voice—voicelessness, accent— 
absence of accent, palatalization—absence of palatalization, etc.) plays a basic part, 
acting as a signal that differentiates meaning. The discrimination of these phonetic 
features, produced in the sound-complex by means of a variation in the articulatory 
processes, provides the mechanisms necessary for speech perception, 


q 94,7% 


¥ 370% 


20} 
10 59% 6.2% 
zs%e 








S 





























Fig. 1. Disintegration of phonemic acoustic perception in variously located brain damage (the 
columns show the percentage of break-downs in the total number of patients with 
corresponding location of the affection: each group comprises 60 - 90 cases). 


It is these systems of discrimination that are broken down in cases of lesions of the 
postero-superior sections of the temporal area of the dominant hemisphere, perhaps 
because this apparatus stands in the closest functional and morphological relation to the 
inferior sections of the kinaesthetic and motes area of the cortex, which play a direct part 
in the act of articulation, and this part of the brain can be conceived of only as forming 
a single auditory-articulatory system. 
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The breakdown of phonemic auditory perception is a fundamental and persistent 
symptom accompanying lesions of the left temporal area, and may be easily revealed 
by simple experiments, such as asking the patient to repeat similar (but opposed) 
phonemes (e.g. /b - p, t-d, z-s/, etc.) or attempting to elicit from him definite differ- 
entiated reactions (e.g. raising the hand in response to the voiced /b/ and not raising it in 
response to the voiceless /p/). Fig. 1 gives a summary of results obtained during the 
war from the analysis of more than 800 cases of bullet-wounds of the brain, and it 
shows that the breakdown in phonemic auditory perception accompanies only lesions 
of the postero-superior sections of the left temporal area (or of parts adjoining them, 
in which case the break-down is a secondary effect), and is not typical of lesions of 
other sections of the brain. Further observations have made it possible to ascertain 
that this remains the most persistent symptom in residual phases of traumatic illness, 
is one of the early indications of the growth of a tumour in this area, and can be 
disclosed through stimulation by means of special sensitized probes even in the most 
serious cases of these lesions. 

The breakdown of phonemic auditory perception at once produces a series of 
secondary disorders : it inevitably leads to a break-down in the language system of all 
those formations in which it took part and which require precise phonemic auditory 
perception as an indispensable condition of their normal functioning. 

The break-down of precise phonetic analysis and synthesis leads above all to a break- 
down in the pronunciation of words, especially where this pronunciation does not have 
an automatic character. It is sufficient to ask the patient to repeat a word which is new 
and phonetically difficult in order to ascertain this. It leads to particularly severe 
break-downs in writing, even when the ability to copy a given text without difficulty 
is preserved ; such a patient proves to be incapable of writing any word at dictation 
or spontaneously (again, provided it is not sufficiently automatized), has difficulty in 
distinguishing its component sounds, and substitutes similar but ‘ opposed ’ phonemes. 

Primary break-down of phonemic auditory perception leads, however, to more 
extensive consequences, exerting its own peculiar effect on the structure of the 
meanings of words, and essentially breaking down the language vocabulary formerly 
possessed by the patient. A patient with break-down of phonemic analysis 
loses the ability to take in words and to differentiate their meanings. If /telo/ and 
/delo/, /tot{ka/ and /dot\ka/ come to sound similar, the meanings of these words 
cease to be clearly differentiated ; if words with a complicated phonetic composition 
(e.g. with a consonant cluster) become for the patient insufficiently articulated groups, 
their meaning naturally ceases to be marked off from the meaning of other words, 
the superficial phonetic similarity, which is over-ruled by the precise phonemic 
structure of normal speech, is given free rein and the patient tends to class together 
words which have something in common in their sound, but which have never before 
been put into the same category of meaning. Thus /otSen/ and /osen/, /kolos/, 
/golos/ and /xolost/, which were never formerly confused, easily begin to replace 
each other, and the basic meaning-structure of the patient’s vocabulary disintegrates, 
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giving way to phonetic connections no longer reduced to an orderly system. The dis- 
integration of the phonemic structure of language inevitably leads to a break-down 
in its lexical structure ; the break-down of the understanding of speech appears as a 
result not of intellectual disorder, but of the disintegration of the complex auditory 
function and it produces a failure of the power to use the systematized language code. 


An essential fact is that the disintegration of the lexical structure of language 
caused by the break-down of phonemic auditory perception does not affect all morpho- 
logical elements of a word to the same degree. Observations by Bein (1950) in 
recent years show that such patients lose the power to understand the roots of words, 
of which there are a great many in each language and for the precise understanding 
of which it is specially indispensable to differentiate them from many other complexes, 
phonetically similar but having a different meaning ; on the other hand, suffixes of 
words, which are relatively few in a language and do not have such a multiplicity of 
possible phonetic connections, ordinarily remain considerably more comprehensible. 
Hence arises the paradoxical and apparently little-understood fact—so contrary to 
the widely held belief that in the pathology of these cases invariably those elements 
are earliest broken down which were last formed—that patients with the form of 
aphasia described lose the concrete meaning (attached to the root of the word) and 
retain abstract meanings, attached to such suffixes of abstract state, as -ost (vidim-ost) 
or -ie (sostoyan-ie, obrazovan-ie) in Russian, -heit (Treu-heit) or -keit (Nachbar-keit) in 
German, -ance (vigilance) in English or French, and so on. 

The potential retention of abstract terms in patients with such disorders,’ and the 
associated relative retention of abstract ideas, is one of the most interesting of those 
phenomena that allow a close approach to the analysis of formations which arose first 
on the basis of vocalic speech, but which, with the development of human mentation, 
have begun to take on a relative independence. 


COMMUNICATION OF EVENTS AND COMMUNICATION OF RELATIONS 


There is no doubt that the careful investigation of these phenomena which arise as 
a result of such disorders opens wide avenues for the analysis of important problems on 
the borders of phonetics and morphology on the one hand and the psychology of speech 
processes on the other. Information which a human being receives by means of 
language is not confined to the designation of single objects ; the most significant 


' The thesis just mentioned retains, of course, its significance only in the case of break-down of 
the phonemic structure of speech in adults who have already formed the semantic system of 
language. Early break-down (or underdevelopment) of phonemic hearing in a child, in whom 
the semantic system of language has not yet been formed, leads to considerably more general 
and serious disorders of mentation. This means only that in describing various forms of 
secondary disorder, which are the result of a partial affection, one must always keep in mind at 
what genetic stage this break-down took place and what system it found already in existence. 
This idea, already expressed by Vygotsky (1956), has a fundamental significance for evaluation 
of the systemic effect of disease, but a consideration of it is beyond the scope of this article 
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information comes from systems of words or from whole communications. Since the 
time of Svedelius (1897) it has been customary to divide these communications into 
communication of events (of the type : “ A dog bit the boy ”) and communication of 
relations (of the type “ Socrates is a man ” or “ The circle is drawn under the square ” 
or “This is the father’s brother”). If the first type is expressed in European 
languages mostly by means of the inflexion system, the second type can be expressed 
both with the help of the most abstract forms of case relationship or with the help of 
special types of preposition or conjunction. 


Linguistic analysis has always indicated a difference in the degree of abstraction in 
the two kinds of communication and the fact that in each of them the object, the visual 
perception and abstract thinking do not participate to the same degree. However, in 
spite of the facts which indicate that they originated at different epochs and that they 
have different logico-grammatical structures, the distinction between them has 
remained relatively formal. 

In this problem of isolating the class of linguistic formations represented in the 
“ communication of relations ” and in analysing peculiarities of their inner structure, 
pathology can offer considerable help. Already at the time of Head (1926), Gelb and 
Goldstein (1924), Van Woerkom (1925), Boumann and Griinbaum (1925) and of a 
number of other more specialised studies, it was noted that patients with disorders 
of the postero-parietal and parieto-occipital sections of the dominant left hemisphere 
show characteristic disabilities : although still able to understand everyday speech, 
they are incapable of understanding the meaning of complicated logico-grammatical 
combinations, which express certain abstract relationships. At the same time these 
patients show considerable difficulty in operating with spatial relations of objects ; 
they lose their ability to operate with mathematical categories, easily confuse arith- 
metical symbols and find themselves in great difficulties when confronted with the 
necessity of analysing numbers in structural arrangements. All these facts have been 
confirmed by research workers, who have given a closer description of the “ parietal 
syndrome ” (Konrad, 1932, Zucker, 1933, Klein, 1931, Gerstmann, 1932, Critchley, 
1953, Hécaen, 1953, Zangwill, 1957, and others). The description of symptoms 
occurring with parietal affections leads us to believe that with the help of pathology 
we shall be able to isolate what is specific to the complex form—the “ communication 
of relations”. However, the theoretical interpretation and explanation of these facts 
has confronted research workers with considerable difficulties. 


Some authors, adhering to the so-called “ noétic trend” (Goldstein, 1948, Van 
Woerkom, 1925, Griinbaum, 1925, and others) and in actual fact continuing the line 
of Pierre Marie, have been inclined to see behind these symptoms a break-down of 
the “ general function of the intellect ” or of “ categorical thinking ” (a view which 
explained very little and has rather obstructed the way to further research). Other 
authors, with a more analytical attitude (Konrad, Zucker, Critchley, and Zangwill), 
have not been satisfied with these somewhat too general principles and have begun to 
look for a more specific brain mechanism as the basis of these disabilities. 
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The solution of the problem has been materially advanced by observations that 
have stated the facts more precisely and have allowed us to make a step forward 
towards an adequate explanation of this most complicated syndrome. 


A careful analysis of patients with limited lesions of the inferior parietal (or parieto- 
occipital) area has shown first of all that in these cases by no means all the forms of 
speech activity are broken down, and by no means all the forms of abstract thinking 
and behaviour. These patients, while still able to understand everyday speech denoting 
events, also preserve abstract notions which express inner psychological states and 
moral values, as well as some abstract categories. They begin however to experience 
considerable difficulties every time they have to analyse a complicated figure and to 
co-ordinate its details into one whole, or to operate with numbers in structural 
arrangements ; they confuse spatial direction when examining numbers of more than 
one digit and—a fact that is of particular interest for the problems with which we are 
directly concerned—they are nonplussed every time they are faced with a “com- 
munication of relations ”. 

All this has compelled us to reject the global and clearly incorrect assertion of the 
“ general lowering of the intellect ” or “ the loss of categorical thinking ”, supposedly 
resulting from these lesions, and, following our chosen direction, to seek the primary 
break-down of the higher nervous processes, which arises in the given local affection 
and which, as a secondary systemic effect, leads to the disorders just described. 

Already experiments on animals have shown that lesions of the posterior sections 
of the cortex and especially of those areas which have acquired the name of “ posterior 
intrinsic areas ” (Rose, 1952, Pribram, 1957) break down the ability to react differenti- 
ally to complex configurations of signs and considerably restrict the information that 
can be received by the animals ; the work of Pavlov and his pupils with extirpations 
of the posterior sections of the cortex defined more accurately the physiological 
mechanisms which are at the base of these break-downs, and showed to what extent 
one set of functional systems may suffer a persistent heightening of sensitivity to 
stimulation whilst another distinct set of systems suffers a decrease in sensitivity. 

Such break-downs in the processes forming the complicated systems of discrimina- 
tion and occurring as a result of lesions of the parieto-occipital sections of the cortex 
have been described in great detail in man. Also, the basic disorder which occurs 
in these cases is illustrated by the phenomena of “simultaneous agnosia”, widely 
known in clinical practice (and described by Wolpert, 1924, and Balint, 1909) as well 
as by the phenomena of astereognosis which are similar in structure. The fact that 
patients lose their ability to “ combine details into a coherent whole ” (Head), that 
there is a loss of synoptic function, of the ability to “see elements in a single 
structure ” (Gcldstein) or to “ transform the act of sequential observation into simul- 
taneous observation ” (Pétzl), all this only reflects those neurodynamic shifts wnich 
in these cases cause a break-down in the systemic activity of the cortex so that the 
patient, as Pavlov said, “ finds himself in a condition to deal with only one point of 
stimulation at a time while the rest are inhibited ”. 
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FRONTO-TEMPORAL LESIONS. 


Fig 2. Disintegration of the operations connected with simultaneous and successive synthesis in 
cases of affection of the posterior (parieto-occipital) and anterior (fronto-temporal) sections of 
the hemisphere The columns show the number of cases in which a given operation could be 
carried out ‘upper columns) or could not be carried out (lower columns). 
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We have had occasion to describe a patient whose bi-lateral lesion of the parieto- 
occipital area of the cortex led to his inability to combine two details simultaneously ; 
in a tachistoscope he could not perceive two letters at once, nor could he make a point 
in the centre of a cross or circle : one visual impression inductively inhibited the other 
and the simplest visual synthesis remained impossible for him. 

Such a break-down of simultaneous synthesis can be quite justifiably regarded as 
the physiological basis of functional disorders, occurring in lesions of the parieto- 
occipital areas of the brain. However, if the lesion goes beyond the limits of the 
paricto-occipital areas of the cortex and reaches the border of the speech area, it may 
produce disorders which go far beyond the limits of the visual realm. They inevitably 
lead to a break-down of the more complicated mnemic operations, connected with 
abstraction of one signal and the synthesis of a series of elements in accordance with 
this abstracted signal. They also lead inevitably to a situation where operations which 
use a definite complex code and which are linked by means of this code with the 
intellectual act become impossible. 

As has been shown by special research, affections of the parieto-occipital areas 
of the cortex lead precisely to the inability to isolate one signal and to 
produce a stable system of heightened sensitivity to stimulation whilst inhibiting 
irrelevant systems. They produce in the patients a significant break-down in generalisa- 
tion from visual figures which are similar in one parameter (e.g., colour, shape, etc.) and 
different in others (Kok, 1957). Similar defects occur in these patients in their evalua- 
tion of space relations (Konrad, Zucker, Head, Critchley, Zangwill), in operations 
connected with counting off elements in space (compare finger agnosia and Gerstmann’s 
syndrome) and finally in arithmetical operations (especially in those connected with the 
analysis of numbers in structural arrangement) and counting, especially of numbers 
exceeding 10. These operations always require the abstraction of a scheme of reference 
and comparison of a whole complex of connections with this scheme. 

In all these cases, where the problem requires such complicated operations, affections 
of the parieto-occipital systems lead inevitably to their disintegration, at the same time 
conserving intact those operations which do not require such simultaneous observation 
of details and which can be realised by means of successive synthesis such as, for 
example, the beating out of rhythms, the reproduction of a series of successive move- 
ments, of the words of a poem, etc. On the other hand, when the fronto-temporal 
systems are affected, with the analysis of which we have dealt elsewhere in detail (Luria, 
1957a), operations associated with simultaneous synthesis remain completely intact, 
whilst operations which require a successive synthesis of elemerits (auditory or motor 
in the first place) into one complete structure, suffer considerably (see Fig. 2). 


The above statement allows us to turn directly to the questions in which we are 
interested in this paper. We have already mentioned the two forms of language con- 
struction, which some linguists call “ communication of events ” and “ communication 
of relations ” ; now we can return to their analysis from the new standpoint which has 
just been classified. 
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“ Communication of events” in its simplest and purest form (of the type “ A house 
is burning”, “ A dog bit a boy”) depends on well-consolidated sequences of verbal 
relationships and does not require from the hearer a preliminary abstraction of one of 
the signals used in them, with the subsequent establishment of a complicated system 
of relations. Only in those cases where, as a result of inversion (e.g. “A boy was 
bitten by a dog”), the usual word order is affected and the order of communicated 


. events ceases to coincide with the word order in a sentence,’ does the preliminary 


analysis of construction become necessary ; isolation and comparison of its separate 
elements and comprehension of this construction on the basis of “ the combination of 
isolated details into a coherent whole ” corresponds this time to the logico-grammatical 
code of the language. 


The very opposite takes place in the understanding of a “ communication of 
relations ”. Even the simplest “ communication of relations ” (if it is not of the most 
common type) is a complicated and peculiar problem, which necessitates a comparison 
of two elements together with the extraction of the basic signal and subsequent 
synthesis of the two elements into a specific structure. 


A typical example of such a system is the construction “ father’s brother ” and the 
opposite construction “ brother’s father”. In both cases it concerns two objects, 
described as “ brother ” and “father”. The construction as a whole does not refer 
to these objects, but to a third one, resulting from their logical combination : “ uncle ” 
in the first and “ father ” (defined from another, new relationship) in the second. In 
order to grasp this construction, which has developed relatively late, it is not sufficient 
simply to perceive two isolated designations ; it is necessary to isolate the basic symbol 
(“ father”) and to grasp the meaning of the second symbol (“ brother’s ”), having as 
a starting point its relation to the first. In some languages (e.g. Russian), which show 
these relations by means of the genitive attributive, it is necessary to carry out an 
operation of abstraction from the substantive meaning of the noun which is in the 
genitive and to convert it, according to its meaning, into a qualitative word (father’s 
brother = paternal brother). 


A completely analogous psychological construction is that of the communication of 
spatial relations expressed by means of a preposition, e.g. “a circle under a triangle ” 
and its opposite “a triangle under a circle”. In this case the simple designation of 
the two elements merely begins, but does not complete, the formation of the complex 
idea ; for the understanding of the meaning of this construction it is also necessary to 
distinguish the basic object (e.g. “ triangle”) and to evaluate the spatial relation of 
the second object (“ circle ”’”) to the first. A similar construction is that of communica- 
tion of temporal relations (e.g. “ summer following the spring ” or “ spring following 
the summer ”’), instrumental relations (e.g. “ the earth is lighted by the sun” or “ the 


1 In the case of separated and subordinate clauses; the following sentence may serve as an 
example of such a construction: To the school where Peter is studying, a woman came from 
the factory to talk about the preparations for the holiday. 
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sun is lighted by the earth”), the correct forms of which may be established only 
after an analogous operation of distinguishing the basic designation, and of establishing 
the meaning of the second object, and then in turn of establishing the meaning of the 
whole construction from the relation which has been analysed. 


After what has been said it becomes quite clear that these constructions which are 
very complex in form and came into being relatively late’ are very vulnerable, and 
inevitably disintegrate when the basic condition, indispensable for their understanding, 
is unattainable as a result of the break-down of complex “ simultaneous synthesis ” 
and of the ability to combine isolated elements into a single analytic “ unit ”. 


This is why patients with parieto-occipital lesions, especially those located on the 
border of the “ speech zones of the cortex ”, are often incapable of coping even with 
apparently simple “ communications of relations ”, are baffled by the task of “ drawing 
a square under a circle ” (ordinarily carrying out the task in an agrammatical way in 
the order in which the words are spoken and hence drawing a square, then under it 
a circle), and prove to be completely incapable of distinguishing between the two 
constructions “ father’s brother ” and “ brother’s father ”, declaring that in both cases 
the brother and father are spoken of and consequently the constructions are identical ; 
and readily accept the construction “the sun is lighted by the earth,” in which the 
habitual order of words in an active construction is agrammatically adopted as being the 
correct one. This symptom, appearing distinctly in cases where the affection damages 
the most complex and most recently formed zones of the parieto-occipital region at 
its border with the temporal region, constitutes a basic symptom of so-called “ semantic 
aphasia”. For the analysis of this condition, problems in understanding the simplest 
logico-grammatical relations occupy the same place that problems in the differentiation 
of opposed phonemes occupy in the analysis of temporal “ acoustic aphasia ”. 


The close study of the changes that take place, in the cases we have considered, in 
the employment of the most complex codes of language (the phonetic code in one 
the semantic code in the other case) consequently permits the use of focal lesions of 
the brain as a means for the analysis of complex linguistic phenomena and for the 
analysis of those functional formations which, without this method, would remain 
difficult of access to pathological analysis. 


1 In some languages the emergence of such complex flexional or prepositional constructions goes 
back only as far as the XIVth- XVth centuries and in earlier records and documents such construc- 
tions are replaced by simpler ones, not requiring any complex analytic-synthetic effort. Typical 
examples of such simplification with the replacement of complex forms of government by simple 
paratactic constructions are the Greek : “ They escaped the strength and arms of the Achaeans 
(instead of “the strength of the Achaeans’ arms.”) (Odyssey, VIII, 134); the German “ mit 
Leidschaft urd Liebe” (instead of “mit Leidschaft der Liebe”) (Nibelungenlied, 1148,3) ; 
and the English “For yonder I hear Sir Guy’s horn blow and (for ‘which’) has slain 
Robin Hood ” (Ballad of Robin Hood). 
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THE EFFECT OF FRONTAL LESIONS ON SPEECH 


We have surveyed the analysis of those break-downs in the complicated system of 
linguistic constructions caused by affections of the parieto-occipital systems of the brain 
which lead directly to the disintegration of simultaneous synthesis and to the break-down 
in the act of “ combining isolated details into a single whole ” which Head so rightly 
pointed out in his time. 

The logic of our exposition requires us now to turn to disturbances of the speech 
structures which arise in cases of disintegration in another important function, which 
as all the data show (cf. Luria, 1957a), is provided for, in the first place, by the anterior 
fronto-temporal sections of the cortex, and which results in the synthesis of successive 
elements into a single continuous series or “ dynamic system”. Such disturbances of 
sequential synthesis and of acoustico-motor sequences directly linked with them, or, 
as they are often termed clinically, “ kinetic patterns”, do not remain without effect 
on the speech systems also. But in this case, as we have shown elsewhere (Luria, 1947, 
chap. 4) the disintegration of complex speech formations proceeds along a completely 
different course ; the patient, revealing no noticeable defects in the distinguishing of 
the phonetic elements of verbal speech or in the grasping of logico-grammatical relations 
in language, begins to display noticeable break-downs in smooth transition from subject 
to predicate, and consequently, in the realization of that “ propositionizing ” of which 
in his time Hughlings Jackson spoke so fully and in such detail. This break-down 
of sequential synthesis does not destroy single systems of stimulation but impedes 
the easy switching off of these stimulations and the transition from one system of 
innervation to another (in a pure form this break-down is seen in the so-called 
“pre-motor syndrome”). As a result of such disturbances there is a secondary 
affection of internal speech, which—as has been correctly affirmed by a number of 
psychologists— is predicative speech turned inwards (Vygotsky 1956) and the presence 
of which is absolutely indispensable for fluent predicative propositionizing. It is these 
affections which eventually lead to the appearance of that exceptionally interesting 
phenomenon which is widely known in clinical literature as “ telegraphic style,” and 
which Jakobson with full justification considers as a break-down of contextual speech, 
in many ways opposite to the break-down of the “code of language” (phonetic or 
logico-grammatical), with which we were concerned above. However, the limits of 
this paper do not allow us to concern ourselves in greater detail with this peculiar form 
of disorder, or, what is most important, to throw light on the primary neurodynamic 
conditions of its appearance in the same way as we have attempted to do with respect 
to the forms of speech disturbance dealt with earlier. 

We shall therefore cut short our exposition and pass to the consideration of those 
forms of the pathology of speech which in many respects have special interest and 
which have not received the attention they merit. Up to now we have been concerned 
with the analysis of information provided by focal lesions of the brain for the analysis 
of the structure of the speech processes and in particular for the intensive study of the 
phonetic and morphological, semantic and syntactic side of speech. However, apart 
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from these essential aspects of the speech process, which make possible the use of 
language as a means of communication and an instrument of thought, there is still 
another essential function of speech, which has received even scantier attention from 
linguists and psychologists. We have in mind the regulating function of speech, which, 
as we shall show below, remains relatively intact in the cases described above and 
which requires close consideration. 

When an adult addresses some order to a child, he calls into action in the child a 
system of governing connections, which direct all its subsequent behaviour and inhibit 
all irrelevant actions. The speech of the adult here regulates the behaviour of the 
child. This regulating effect, which the speech of the adult exerts on the child’s 
behaviour, eventually becomes the source of complex new functional formations. In 
acquiring his own speech, at first external and later internal, the child begins to use 
it not only as an instrument of communication and thought, but as a means of regulating 
his own behaviour. The child’s own speech, which helps him in orientating himself 
in his environment, and creating a system of connections in which he reflects reality 
and formulates his desires, enables him to map out his own behaviour and to regulate 
the course of his activity. Research carried out in recent years and treated by us in 
another place (Luria, 1957b) has made it possible to show what a complex path is 
traversed by the formation of this regulating function of speech before it becomes 
capable not only of setting in motion known and already consolidated actions, but of 
locking in the governing system of connections and checking all irrelevant actions, 
which do not relate to the fulfilment of the task formulated in speech. There is no doubt 
whatever that all the highest functional formations with which psychology is concerned 
—the accomplishing of conscious, purposive action, systematic active thought, voluntary 
memory—all these are in greater or less degree linked with the regulating function of 
speech. In all these cases external (or more often internal) speech locks in an existing 
system of connections, which in normal behaviour become dominant, and which define 
the course of all the subsequent actions of the person, acquiring sometimes—as, for 
instance, was the case with Giordano Bruno—a strength which considerably exceeds 
the strength of vital instincts. 

What does the study of pathological states of the brain contribute to the analysis of 
this very important but still little-studied regulating function of speech? Is that 
function broken down in equal degree in various brain lesions, or can we distinguish 
particular brain-systems, the preservation of which is absolutely indispensable for the 
regulating effect of speech on behaviour ? 

The cases of focal brain lesions, which we have analysed above, broke down the 
phonetic and lexical, semantic and syntactic side of speech, but did not yet lead to a 
distinct break-down of its regulating function. Patients suffering from the defects 
described above readily executed the doctor’s instructions, formulated in speech, 
concentrated for a considerable time on carrying out the tasks he set and frequently 
revealed an exceptional persistence in the task of compensating for their defects, without 
which the restoration of the broken-down functions would be impossible. Special 
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experiments (as yet unpublished), carried out recently, have shown that in all these 
cases the patient’s speech, broken down in the phonetic or grammatical respect, 
continues to preserve its determining, directing role, ensuring at the same time 
intelligently directed behaviour by the patient. 


In order to study the pathology of the regulating function of speech, it is consequently 
indispensable to go beyond the limits of the forms of speech disorder studied by us, 
and as we shall see below, in general beyond the limits of what is clinically known as 
“ aphasia ”. 


For the successful solution of the question that interests us we should for a short 
time leave the description of speech disorders and consider the facts revealed by 
contemporary neurophysiology and psychology in studying the brain-mechanisms 
regulating the active behaviour of animals. 


All contemporary work which has studied the peculiarities of animal 
behaviour after the removal of separate parts of the brain, beginning 
with the work of Franz (1907), Bianchi (1921) and Pavlov and ending 
with the latest researches of Fulton (1945), Pribram (1957) and Anokhin 
(1949)—-despite the divergence in the positions from which they started—lead 
to one essential fact. Ablation of the posterior sections of the major hemispheres causes 
in the animal a break-down of accuracy in the operation of separate analysers, which 
leads to a defect in particular types of differentiation and the restriction of corresponding 
information coming from the environment, but never leads to 2 break-down of the 
general regulation of the animal’s behaviour, which remains just as expedient as before. 
On the other hand, ablation of the frontal sections of the major hemispheres, though 
not causing any noticeable break-down in the action of separate parts of the exterocep- 
tive apparatus, sharply alters the whole behaviour of the animal: it ceases to distinguish 
the essential in a situation, to assess its own experience, it reacts identically to those 
stimuli which are important for life and to those which are indifferent ; it continues to 
make movements which have become senseless after it has found the bait it was looking 
for (as was the case in the experiments of Anokhin, Pribram and Shustin). Evidently 
it does not assess in its behaviour the general situation and the effect of previous 
experience. All this justified Pribram in saying that the frontal sections of the brain 
have special functions, determining the expediency (“ utility function ”), the selective 
character of behaviour (“ preference behaviour ”), and allowed Anokhin to come to the 
conclusion that their function, closely connected with the regulation of motor behaviour, 
consists in the creation of a complex “ pre-initiatory afferentation ”, which includes 
signals concerning the successful completion of an act, and makes possible the realiza- 
tion of preference behaviour. 


The part piayed by the frontal sections of the brain in the regulation of complex 
forms of preference behaviour is thus shown to be indisputable, and even if the physio- 
logical mechanisms which ensure this regulation remain up to the present obscure, 
the distinguishing of this function of the frontal portions should be considered among 
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the important achievements of neurological science.’ 

If the purposive behaviour of an animal is determined by that “ pre-initiatory 
afferentation ”, which in fact regulates all its subsequent activity, then in a human being 
such “ pre-initiatory afferentation” consists in a system of governing connections, 
formed by means of speech. Therefore it is natural that in cases of affections of the 
frontal systems in man, there should be a break-down not only of the system of synthetic 
“ pre-initiatory afferentations ” which determine his subsequent behaviour, but in the 
first place a break-down of the system of “ pre-initiatory afferentations ” created on the 
basis of speech—in other words, a break-down of the regulating function of speech 
connections. 

Protracted research into the disintegration of the structure of behaviour in patients 
with lesions of the frontal portions of the brain, carried out by us together with a 
number of colleagues (Filippycheva, 1952, Meshcheryakov, 1953, Ivanova, 1953, etc.) 
have made it possible to describe the peculiar picture of disorders arising in these 
cases, which have never been classed as aphasia but which may be understood 
essentially as break-downs in the regulating function of speech. 

As a rule, we do not observe, in patients with lesions of the frontal portions of the 
brain (however severe and massive these affections may be) any noticeable break- 
downs whatever in the structure of speech processes: the phonetic system and 
vocabulary, the semantic system and the grammar of speech of these patients proves 
as a rule to be completely intact both in its receptive and its expressive part. It is only 
in patients whose brain affection is situated in the postero-inferior sections of the left 
frontal area adjoining Broca’s area that we observe a certain inactivity in speech, a 
break-down of monologue speech with preservation of responsive (dialogue) speech, 
which we have had occasion to describe elsewhere as a symptom of “ frontal aphasia ” 
(Luria, 1947). 

And yet, despite its superficial intactness in these patients, the regulating function 
of speech proves to be severely disintegrated. This disintegration is revealed in patients 
with severe (and particularly bilateral) lesions of the frontal areas either by simple 
observation or in experiments involving their carrying out simple instructions. In 
many—the most severe—cases, such patients prove unable to carry out even the 
simplest instructions ; though they may repeat in an echolalic way the experimenter’s 
order “ Raise your hand ”, they nevertheless make no attempt whatever to perform 
the required action. Sometimes the words of the experimenter may set in motion the 


1 It is interesting to note that Pavlov stresses the distinction of the functions of the occipital and 
frontal sections of the brain. “If you cut out the whole occipital part of the cerebral hemi- 


spheres of a dog,” he writes, “ you will get an animal which is in general quite normal.... It 
wags its tail when you stroke it. It will also show its pleasure, by sniffing in recognition at 
you. But such an animal will be unable to react to you if you stand at a distance.... Such 


a dog has very little use from its eyes and ears, but for the rest, is completely normal. But if 
you cut out the frontal part of the cerebral hemispheres .... then you will have an obviously 
abnormal animal. It has no correct reaction to other dogs, nor to food, nor in general to 
surrounding objects. It is a completely ruined animal, evidently left with none of the signs of 
purposive behavour”. I. P. Pavlov, Coll. Wks., III, 175-6. 
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A. R. Luria 31 


action of the patient, but he proves to be powerless to stop it or to change the action 
once started co another. If such a patient is given the task of drawing a circle, he 
begins to carry this out, but then, submitting to the inertia of the stimulus in the motor 
regions of the cortex, he proves unable either to cease this movement or to change 
over to another action ; when asked to draw a triangle, he correctly repeats this 
instruction, but continues through inertia to draw a circle. The inertia of an act once 
initiated and the weakness of the regulating effect of speech is displayed in experiments 
with the carrying out of a complex of instructions ; in response to the request to draw 
a triangle, a circle and two squares, the patient begins to draw a number of triangles 
or a single triangle, accompanied by several circles, although in some cases the patient 
remains capable of repeating the instruction. 


Particularly distinct data, indicating a significant break-down of the regulating 
function of speech, were gathered from these patients in special experiments, in which 
the speech of the experimenter did not set in motion the old, well-established con- 
nection, but locked in a new conditioned connection. If, as was shown by 
Meshcheryakov and Ivanova, such a patient was asked, in response to a flashing light, 
to press on a rubber bulb, or in response to a red light to press with the right hand, but 
in response to a blue light to press with the left hand, these instructions continued to 
regulate his action only for a very brief period, and the carrying out of the instruction 
would very soon cease, often being replaced by an inert repetition of stereotyped 
pressures, which would completely iose their relation to the preliminary signal and 
were carried out independently of it. Only by changing over to a prolonged insistence 
on the instruction by means of a frequent repetition of the command (“ Press !” or 
“ Don’t press !”’) after each preliminary signal, was it possible to produce the forma- 
tion of the required conditioned connection, but even this survived only a very short 
time, easily disintegrating in the course of the experiment and being replaced by 
persistent pressures which were not timed to the signal. 


The break-down of the regulating role of instruction by speech appears most clearly 
in experiments with delayed reaction. When patients with a massive lesion of the 
frontal areas were asked to raise their hand in response to a knock produced 15-20 
seconds after the instruction, they did so, but in cases of the most severe affection the 
regulating action of the instruction was broken down to such an extent that the move- 
ment set in motion by the instruction—the raising of the hand—was replaced by an 
imitative movement ; the patient would reproduce the movement of the experimenter 
by knocking beside the place where the experimenter knocked. However, when such 
a patient was given the instruction, requiring the production of a reaction in accordance 
with speech connections already set up (e.g. “ When the second hand of the watch 
reaches 25, raise your hand ”, or “ When I count as far as 12, raise your hand”), the 
patient proved incapable of making the reaction ; the best the patient could do was to 
declare “ Now it’s reached 25 ! ” or “ That’s 12 now” but would make no movement. 
It is characteristic too that this fact is explained not by the break-down of memory 
(after the unsuccessful attempt the patient when questioned was easily able to repro- 
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duce the verbal instruction) but by heightened external inhibition, as a result of which 
the activity which had started (following the hand of the watch or listening to a series 
of numbers) inductively inhibited the influence of the instruction and produced a 
break-down of its regulating function. 

A characteristic feature of these highly peculiar functional disorders is the fact that 
the phenomena here observed do not amount merely to the break-down of the regulating 
function of another person’s speech (or more accurately, the systems of connections 
set up by it). Experiments were carried out in which we asked the patient to confirm 
by his own speech the effect of the instruction, making him declare the significance of 
the stimulus e.g. by saying every time the red lamp flashed : “I am to press !” and 
‘when the blue lamp flashed: “I am not to press ! ”—a method worked out with us 
by Khomskaya, the influence of which in ontogenesis has been described elsewhere 
(Luria, 1956, 1957b). These experiments showed that even the patient’s own speech, 
easily passing over to an inert stereotype, and losing the required connection with the 
signal, in fact also ceases to display its regulating effect. In these cases the patient 
either begins inertly to alternate the responses “I am to press! ”, “I am not to 
press |”, or (if his speech reactions were sufficiently strong) correctly reproduces the 
required responses, but ceases to make the movements corresponding to them, replacing 
them by inert stereotypes. 

The break-down in the regulating function of speech proves characteristic of the 
whole behaviour of such patients ; it deprives their activity of the required purposive- 
ness and leaves an imprint on the whole character of their broken-down intellectual 
processes. 

The close study of this form of speech disorder, which is empirically well known 
to clinical workers, but which has not received the attention it merits, is still in its 
infancy. But there can be no doubt that systematic research into the method of 
formation of the regulating effect of speech in ontogenesis, how it is brought about in 
normal behaviour and how it disintegrates in pathological conditions of the brain, will 
reveal a whole series of facts of considerable interest both for psychology and for that 
branch of science which deals with the pragmatic forms of speech activity. 

We have thrown light here on only a few problems which are revealed to the 
psychologist when he uses the observation of pathological states of cerebral activity 
as a method permitting the discovery of some mechanisms of the speech processes which 
are internal and not easily approached by pathological investigation. 

It would be a mistake—though unfortunately a common one—to think that patho- 
logical states of the brain return speech to stages it has once passed through and allow 
one to follow out the history of its formation in reverse. Pathological changes in 
cerebral activity break down one or another physiological condition, indispensable for 
the normal existence of speech processes ; therefore in fact they never reproduce any 
of the earlier stages of speech development. But “ breaking down and simplifying what 
is fused and indivisible in the physiological norm ”, they permit the use of this method 
as an important means of analysing the psychological construction of speech and the 
actual forms in which language is used. 
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THE SOLUTION OF SOME FUNDAMENTAL PROBLEMS IN 


MECHANICAL SPEECH RECOGNITION * 
D. B. Fry AND P. DENEs 
University College, London 


This paper offers a theoretical treatment of the main problems that arise in 
mechanical speech recognition, based on the conclusions reached in experiments on 
the perception and recognition of speech sounds and on experimental results already 
obtained with a mechanical recognizer. In the first part of the paper, the problems 
of primary or acoustic recognition are dealt with ; they include the “ gating” problem, 
the choice of recognition units, and the acoustic recognition of different classes of 
speech sound—vowels, plosive consonants, fricative consonants and _ periodic 
continuants. The second part discusses the use of language statistics in mechanical 
recognition. 


THE PURPOSES OF MECHANICAL SPEECH RECOGNITION 


The development of a machine which will “ recognize ” speech sounds implies the 
building of a model that reproduces the operation of some part of the human speech 
mechanism. A machine of this kind has been described in previous publications 
(Fry and Denes, 1953, 1955, 1957) ; the purpose of the present paper is to discuss 
some of the fundamental problems that are involved and to show how these have been 
solved or may be solved through a consideration of the human mechanism. 

It is necessary first to define the function of a mechanical speech recognizer and 
this can be done only by reference to human communication. A human being can 
listen to speech and write down what he hears, or type from dictation. The trans- 
formation effected in this case requires first the recognition of linguistic elements on 
the basis of the acoustic input and then the re-encoding of this sequence of elements 
in the form of a letter sequence. To carry out this process of recognition and encoding 
is the function of a mechanical speech recognizer. The fact that in this example the 
input sequence is re-encoded in a visible form is not material to the argument. In 
another common instance, the human being re-encodes the input by actuating the 
mechanism of a teleprinter, and so encodes the input as electrical impulses which can 
be transmitted over a particular type of channel. The essential function of a 
mechanical recognizer therefore is to perform the recognition and to make the results 
of recognition available for any further transformation that may be required. 

The experiments referred to in this paper were made with a machine designed to 
fulfil this function. This mechanical recognizer receives the sound waves of speech, 
processes them and then types a sequence of letters to correspond with the sound 
sequence spoken into it. It is, therefore, a model which can replace functionally 
the human being in the chain: acoustic speech input — human being — typewriter 
or teleprinter Construction of a model necessarily involves some consideration of the 


* Some of the problems dealt with in this paper are being explored under Contract AF 61 (514)— 
1176 with the Air Research and Development Command, U.S. Air Force. 
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purposes which models may serve and the limitations to which they are subject. The 
recognizer, which has been described elsewhere, consists of electronic and mechanical 
components. It is clear that such a model is incapable of any operation or series of 
operations which cannot be predicted and in this sense the model itself cannot 
provide new knowledge. This is true of all models that simulate the operation of 
human mechanisms. There are, however, three ways in which such a model may 
be of value. It may, first, serve as a demonstration that a mechanism of a particular 
type can produce a given end-result, that is, it may function as a terminal analogue. 
If the mechanical recognizer takes in the sound-waves of speech and yields the same 
series of typed letters, or the same set of coded signals as are produced by the 
human operator, it can then be stated that the end-result given by the human being 
can also be provided by a mechanism of the type and with the characteristics of the 
mechanical recognizer. Second, the model may perform the very useful function of 
a computer. Though the characteristics and mode of action of each component may 
be known, it may be a long and laborious task to derive from this knowledge predic- 
tions of the behaviour of the system as a whole, particularly in the case of a model 
containing many component units. In such a case, the machine computes the effect 
of combining a series of operations in a specified manner and the results of the 
computation can be obtained from a study of the output from the model. Third, the 
model may itself be used in purely practical ways, which are independent of the 
theoretical basis of its action. It may perhaps be most appropriate to discuss the 
practical applications of a mechanical speech recognizer before going on to the more 
theoretical discussion. which forms the major part of this paper. These practical 
applications lie in the field of telephony. 


Research has now been going on for a long time which aims at exploring the 
possibility of transmitting recognizable speech information over communication 
channels that have as small an information-carrying capacity as possible. The human 
speaker, when he encodes a phonetic sequence in an acoustic form by the operation 
of his speech mechanism, does the encoding in a way that requires a communication 
channel of about 4000 c.p.s. bandwidth and a signal to noise ratio of about 30 db. 
for recognizable transmission of the speech information. This requires a chanrel 
capacity of about 40,000 bits /second. 


Several analysis-synthesis speech transmission systems are in existence that reduce 
the channel capacity required to transmit speech by attempting to simplify the 
acoustic wave that reaches the listener. In a formant tracking type of transmission 
system, for example, the only features of the original speech wave that the listener 
receives are the fundamental frequency and the first three formants. If we assume 
that such a system needs five channels of 20 c.p.s. bandwidth each and a signal t 
noise ratio of 30 db., the total channel capacity required will be 1000 bits/secon 
The method used in this and all other types of analysis-synthesis systems is to present 
the listener with a simplified version of the original speech wave and leave him to do 
all the linguistic interpretation. 
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A mechanical recognizer works on a different principle. It interprets the sound 
wave in terms of phonemic units and transmits information about these units rather 
than about features of the acoustic wave. Such a system would effect a very large 
saving in the channel capacity required. Assuming that there are 40 phonemes in 
English and that they are being sent at the rate of 10 per second in ordinary speech, 
the channel capacity required will be about 55 bits/second. This, assuming a channel 
signal to noise ratio of 30 db., would represent a bandwidth of about 6 c.p.s. In such 
a system the information transmitted about the phonemic sequence would control a 
speech synthesizer so as to produce artificial speech sounds for the listener. It could 
equally well be used to control a typewriter and to present a sequence of printed 
letters. 

The achievement of such practical ends requires a better understanding than we 
yet possess of the fundamental processes involved. The rest of the paper will be 
concerned with these. We shall be dealing with observations of human mechanisms, 
hypotheses about these mechanisms and experimental results from the electro- 
mechanical analogue. The procedure will be to discuss the basic difficulties which 
stand in the way of successful mechanical recognition, to examine the means by which 
the human being overcomes these difficulties — either as a matter of observed fact 
or of hypothesis — and to show how these methods have been or may be used in 
the machine. 

Several basic features of the human method of recognizing speech sound have 
been extensively treated elsewhere (Fry, 1956, 1957) and can therefore be summarized 
here. There are two stages in the recognition of speech sounds, first that of primary 
recognition which the listener bases on his perception of incoming sounds, and second 
the application of statistical knowledge (concerning sequential probabilities) to the 
results of primary recognition. At both these stages, the listener takes advantage of 
the high degree of redundancy that speech exhibits as a form of communication. There 
is a plurality of cues in the acoustic speech input available for use in primary recognition 
and the listener uses different combinations of these cues as occasion demands. At 
the second stage, the listener’s linguistic knowledge is used to confirm the results of 
primary recognition, to correct errors or to fill gaps. In the discussion of mechanical 
speech recognition, the problems connected with these two stages will be considered 
in succession. 


THE ACOUSTIC RECOGNIZER 


In the mechanical recognizer, the action of primary recognition in the human being 
is simulated by straightforward inspection of the physical input to the system. The 
quality dimension with which the listener is most concerned is represented, as in many 
machines of this type, by the pattern of spectral distribution of energy in the incoming 
wave-motions. The speech input is analysed by means of a bank of filters covering 
the range from 100 to 8000 c.p.s. in adjacent bands. The output of each filter is 
rectified and thus a set of varying voltages is obtained which reflect the energy level 
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in each filter at succeeding moments. The occurrence of a given sound in the speech 
input is found to give rise to maxima of energy in a particular pair of filters, the 
sound /a:/, for example, in the filters numbered 7 and 10, the sound /(/ in filters 
numbered 12 and 17 (see Fig. 1). The recognizer is at present dealing with a 
restricted repertory of 14 sounds, and there are therefore 14 pre-selected pairs of 
filters. The outputs of the two filters in a pair are multiplied together and this gives 
14 voltage products. The relative magnitude of these products varies as changes in 
the speech input move the energy peaks from filter to filter and this fact provides 
the machine with the means of simulating the human listener’s primary recognition 
on the basis of sound quality. The pre-selected pairs of filters constitute the patterns 
of spectral distribution of energy stored in the machine. The spectrum of the speech 
input is compared with these patterns and the best match selected by finding the 
greatest of the 14 voltage products. This operation is carried out by an electrical 
maximum detector circuit. 
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Fig. 1. Schematic diagram of the acoustic recognizer. 


THE “GATING” PROBLEM 


Given some such method of primary or acoustic recognition, a fundamental problem 
arises at once, that of instructing the machine at what moments to make a fresh 
recognition operation. This we may call the “ gating ” problem. 
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The human operator who listens to speech, receives at his ear-drum a continuously 
varying wave-motion. If he types what he hears, he supplies at the output a sequence 
of discrete symbols and we have therefore at some stage in the complete process a 
transformation of a continuous input into discrete units. A common mechanism for 
obtaining such a transformation is one which simply integrates over some period and 
thus introduces discontinuities. Such devices are working all the time in the human 
operator ; for example, at a very early stage, when the wave-motion received by the 
ear-drum is sent along the acoustic nerve trunk, it is already transformed into discrete 
nerve impulses and it is in a similar form that the signal is dealt with in the brain. 
These integrations, however, simply reflect the time-constants of the physiological 
circuits which are at work, they are relatively invariant for a given circuit and are 
independent of the time pattern of the inputs. In the recognition of a sequence of 
speech sounds the necessary discontinuities are introduced by integrating along 
psychological dimensions and any time integration that occurs is one in which the 
integration time is much longer and, above all, is not a constant. If we take the 
spectrogram of an utterance, e.g. the English word /mikanikli/, we may arrive at a 
time scheme for the utterance by noting the epoch of each gross change in the pattern. 
A typical set of time values for the intervals between successive changes is the 
following : 40, 60, 70, 30, 130, 40, 70, 80, 50, 40, 200 msecs. Although the acoustic 
segments are easily distinguished in the spectrogram, it is wrong to assume that the 
recognition of the phonemic sequence corresponds closely with the incidence in time of 
the various patterns. Experiment has shown that the recognition of many phonemes is 
related to cues derived from several of these acoustic segments. In view of this 
fact and of the varying durations of these segments it seems probable that the time 
interval between phonemic decisions varies. If this is so, phonemic recognition cannot 
be a matter of simple integration with time ; as no constant time factor is involved, 
the brain must include a recognition mechanism which integrates with respect to some 
other dimension. It has been suggested by one of us (Fry, 1956) that there is a 
hierarchy of psychological dimensions employed in the recognition of speech sounds 
and that integration with respect to sound quality is the most probable mechanism in 
primary recognition. During the early part of the utterance /mikanikli/, for example, 
the brain perceives a particular sound quality ; there will be no decision by the 
recognition mechanism so long as there is no major change in quality. At approxi- 
mately 40 msecs. after the beginning of the utterance such a change occurs and this 
is a signal for a decision. In this instance we know that the decision, the recognition 
of the sound /m/, is not actually made until some time later and in no case can we 
make definite statements as to the moment at which such a decision is taken. We can 
however state the number of decisions taken in the whole period of a given utterance 
and we can set some time limits for each decision. A mechanism such as we have 
just postulated would produce the results which are in fact provided by the human 
recognizer and if we could successfully simulate this action in a machine we should 
overcome one of the major difficulties in the way of mechanical recognition. 
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In an electronic analogue the simplest mode of operation is to allow the machine 
to repeat its recognition process at regular intervals. The machine is arranged to have 
a time constant which can be set at some convenient value ; it will then integrate 
effects over an invariant time interval and will make a decision at the end of each 
cycle. There are serious drawbacks to such a mode of operation. The time constant 
would have to be no longer than the duration of the shortest sound ; many of the 
sounds have a much longer duration and such a principle then introduces errors of 
various kinds. A short sample of an /s/, for example, would most probably be 
recognized as a /t/ and so the output would show a succession of t’s where the input 
contained /s/. A device which employs a standard integration time is that of Dreyfus- 
Graf (1950). Here the cycle is short and the output contains repetitions of the same 
symbol. .This is a disadvantage even in cases where the symbol is the “ right” cne, 
that is, corresponds to the phoneme input. 


A more general disadvantage of such systems is that the output of the recognizer 
working with an invariant cycle is necessarily sensitive to changes of tempo in the 
speech input, since these changes vary the relationship between the time constant of 
the machine and the duration of the sounds it has to recognize. For this and the 
other reasons already given, therefore, it seems desirable in designing a recognizer 
to get away if possible from the invariant cycle type of operation, and this has been 
achieved in the machine under discussion in this paper. 


We have already seen that the machine carries out a process of pattern matching 
in the form of finding the maximum of a set of voltage products derived from pairs 
of filters. Suppose that the input is the sound /s/, then the pair of filters associated 
with this sound will give rise to a product higher than that of any of the remaining 13 
products. This fact is established by means of the maximum-detector, and the decision 
is registered that the incoming sound is /s/. No further decision is now made so 
long as the product of the /s/ pair of filters remains the greatest. If the sound /s/ 
is followed by the sound /i/ in the input, then, at a given moment, there will be a 
change in the output products, the greatest of them will no longer be provided by the 
/s/ pair of filters but by the /i/*pair. The maximum-detector notes this change and 
a fresh decision is now registered that the incoming sound is /i/. This device is 
therefore able to deal with the continuous wave-motions of speech and to transform 
them into a series of discrete units without being tied to a regularly repeated cycle of 
operation. It is a self-gating device and its repeated operation is governed not by the 
amount of time that has elapsed but by the magnitude of change in another dimension 
— in this case, change in the spectral distribution of energy. 


It is obvious that such an arrangement must have a threshold of operation, that is, 
there must be a minimum change in the relation of the voltage products that will 
lead to a fresh decision. This is presumably true also in the human case, since there 
are changes of quality in the course of a sound that we disregard for the purpos: of 
speech recognition. In the mechanical recognizer, this threshold has been largely 
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imposed by the nature of the associated maximum detector circuit and this value has 
been found empirically to give good results. 

There are inevitably some problems of segmentation that cannot be resolved by 
this type of gating. When the same sound is repeated, as it is at word junctions in 
sequences like will look, or six sevens, the machine would fail to make a fresh decision 
for the second /l/ or /s/, but such cases are comparatively infrequent and this type 
of error is not serious from the point of view of the reader who is dealing with the 
final output of the machine. 


THE CHOICE OF RECOGNITION UNITS 


The technique just outlined provides one satisfactory solution to the problem of 
transforming the continuous speech input into a series of discrete signals. The result 
is a sequence of quasi-phonemes which can be made use of in the next stage of the 
machine. This fact does not in itself settle the question what size of unit should be 
adopted for the final recognition operation. The human listener uses phonemes in 
order to build up units of higher magnitude — morphemes, words and sentences — 
and it is necessary to consider whether units corresponding to any of these might be 
more satisfactory as a basis for recognition. 

It is generally true that the further one proceeds through a linguistic sequence, the 
stronger are the constraints which operate. This means that the longer the unit that 
is dealt with, the more certainly would one arrive at correct identification (leaving aside 
for the moment the question of sequential errors), It may be easier to identify 
correctly a syllable /pik/ as a unit than to identify /p/ and then /i/ and then /k/, 
because not all combinations of phonemes are possible as syllables. Similarly, the 
word /pikt\o/ may be recognized more certainly than its component phonemes or 
syllables, and the sentence /hi:z bo:t 9 pikt\o/ more certainly than its component 
words. There are however other considerations that affect the choice of unit in a 
recognizer. The first is a strictly practical one, and that is the problem of storage. 
Whatever the size of unit concerned, recognition depends on matching an incoming 
pattern with one of a set stored in the machine, and the amount of storage needed is 
an important consideration. The choice of longer units for recognition would affect 
the storage required in two ways. First, each pattern would be longer and thus 
require more storage space, and second, a much more important factor, the total 
number of units to be stored increases very rapidly with the length of the unit. In 
English, for example, a total of 40 units would cover the whole phonemic system, but 
it would be necessary to store a very much larger number, certainly of the order of 
thousands, if the syllable, morpheme or word were adopted as the recognition unit. 
The number of possible sentences would be very much larger still and require 
correspondingly greater storage. Although pattern storage with present-day techniques 
is not very difficult, it is clear that it is not economical to adopt the larger units for 
recognition and it is questionable whether reduction in the number of errors obtained 
in this way would warrant the increase in storage. 
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There is a further reason for the adoption of the phonemic unit as the basis for 
recognition, and this is to be found in the “inclusive” nature of the recognition 
operation in all the machines devised so far. In every device a restricted repertory 
of recognition units is set up and all inputs to the system are treated as though 
composed entirely of a sequence of these units. The Bell Laboratories Automatic 
Digit Recognizer (Davis, 1953), for example, is designed to deal with ten words, and 
any acoustic input to the system will be “ recognized ” as a sequence of these words. 
The device has no mechanism which will reject an input on the grounds that it does 
not correspond to any of the stored patterns, and indeed, it would probably never be 
worth while to use such a mechanism. In the case of the Automatic Digit Recognizer, 
this feature does not affect the usefulness of the machine since in practice it would 
always receive sequences of digits at the input. The “ inclusive ” nature of mechanical 
recognition has some bearing on the choice of recognition units if we consider the 
ideal case in which the recognizer may deal with any sequence in a given language. 
No matter how large a store of word or sentence patterns the recognizer contained, 
it would always be likely that words or sentences outside this repertory would occur 
in the input and errors would arise because these were forced into the recognition 
scheme of the machine. In adopting phonemic units as the basis for recognition, we 
not only have a much more manageable storage problem, but we can be sure that 
the input will in fact consist only of sequences of these units and will not contain items 
which are not properly classed within the repertory of the machine. We thus avoid 
this type of error which is bound to arise as soon as the number of stored patterns 
is less than the possible number of different items occurring in the input. 


THE VARIABILITY OF THE SPEECH INPUT 


The next fundamental problem in mechanical recognition arises from the variable 
nature of the speech input. When a human being learns to speak and to 
understand speech, a major part of the process of establishing the phonemic system 
consists in grouping many different qualities into one phonemic unit. This is true not 
only with respect to the sub-phonemic distinctions that are generally noted in 
linguistic analysis, as for instance in English differences of aspiration and of voicing, 
but also changes of quality due to the articulation of sounds on different fundamentals, 
with different degrees of emphasis and so on. These differences are very considerable 
even in the speech of a single speaker. If we take into account the very many 
different speakers who can all be understood by the same listener, then the variability 
of the input is very great indeed. Change from one speaker to another introduces the 
effect of the voice quality of the speaker, the need to change from one system of 
relations to another, particularly in the case of vowels, and the adjustment to personal 
idiosyncrasies of pronunciation which may affect the distribution of sounds. It is 
evident that the listener learns to disregard many differences that are in fact 
perceptible, und to make a great many adjustments in the assigning of incoming 
sounds to phonemes. By what kind of mechanism is this phonemic classification 
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carried out? There is no doubt that the over-riding influence is exerted by the 
constraints of the language being used. Any sound heard has to be fitted into one of 
a restricted number of classes, in English about 40, and at every position in a sequence 
the constraints of the language very much reduce the possible choices. The 
exploitation of this type of redundancy in mechanical recognition will be dealt with in 
a later section of the paper, but redundancy is also operative in primary recognition 
since there are generally several acoustic cues available for the recognition of a given 
sound. By using these cues and combinations of them, the listener is able to assign 
a sound to one phoneme or another. 


PROBLEMS OF ACOUSTIC RECOGNITION 


In this section we shall consider the nature of a number of these cues and discuss 
the possibility of their use in mechanical recognition. One set of cues is dealt 
with by the traditional formant theory which was evolved to explain the recognition 
of vowels. Willis (1830) in order to account for the fact that vowels uttered on 
different fundamental frequencies and by different people were recognized as the 
“same ” vowel. came to the conclusion that the presence of high amplitude com- 
ponents in certain frequency bands was the basis for recognition. This was the first 
formulation of the idea of spectral distribution of energy as a cue for recognition. 
It was not then suggested, though it has been since (Potter, 1950), that the frequency 
relation of the formants (that is, for example, between the first and second formants 
of a given vowel) might be an important cue for recognition, rather than the frequency 
location of each formant. This fact has not so far been established by strong evidence 
but it is possible that such information might be used in acoustic recognition, at least 
of vowels. The use of spectral information as an analogue of quality discrimination 
in a mechanical recognizer has already been described. In the repertory of this 
machine there are only four vowel units, /i, a, u, 9/, and the mean value for correct 
recognition of these vowels is 78%, with one of the vowels already at 92%. This is 
no indication that vowel recognition would remain at the same high level if all the 
English vowels were introduced into the machine repertory. It is clear, however, 
that the vowels are easier to recognize mechanically than many of the consonants and 
that spectral distribution of energy is a fairly satisfactory cue for their acoustic 
recognition. It is probably not worth while to seek more sophisticated methods of 
vowel recognition. 

In recent years the formant theory has been extended to the recognition of 
consonants and there are many experimental results which suggest that this is 
justifiable. In the case of consonants, as well as vowels, the presence of high amplitude 
components in certain frequency regions seems to be important for recognition and 
this cue is made use of in the recognizer. The machine repertory at present includes 
space and nine consonants : 

/t k f s z§mail/ 
By using the energy distribution cue in the way already described, the mean value 
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for correct consonant recognition in the speech of one speaker is 68%, the highest 
value being 94% for /\/ and the lowest 23% for /k/. These scores, although they 
are encouraging, are not high enough to satisfy the demands of a practical system, 
and, as we have said in the case of vowels, the addition of further consonants to the 
repertory will inevitably change the success with which the present units are 
recognized. In mechanical recognition, as in human functioning, we are dealing with a 
system of discriminations and not with any kind of “ absolute ” identifications, and 
hence any change in the system, such as the addition of further units, increases the 
difficulty of recognizing all units in the system. It is important therefore to consider 
the possible use of any further acoustic cues that are known to be important for the 
human listener. 

This can be done most conveniently by taking each broad class of consonants 
separately — plosives, fricatives and periodic continuants. Much of our present 
knowledge about the role of acoustic cues in the recognition of these sounds is the 
result of experimental work by the Haskins Laboratories. This work has been 
reported in a number of publications (see Liberman, 1957) and in general only the 
results will be referred to here. 


THE RECOGNITION OF PLOSIVE CONSONANTS 


Plosive consonants are some of the most difficult sounds to recognize mechanically, 
mainly because the acoustic output from the speaker is relatively low during their 
production. During the stop of the plosive, there is no output at all for the voiceless 
or fortis sounds (/p, t, k/) and almost none for the voiced or lenis sounds. When the 
plosive is released the stop is followed by a burst of noise, of considerable intensity 
in the case of the fortis sounds but very weak in the case of the lenis sounds. Experi- 
ments have shown (Liberman, 1952) that the location of this burst in the frequency 
dimension may be used by the listener as a cue for distinguishing between bi-labial, 
alveolar and velar plosives. The occurrence of a burst in the range 3000-4000 c.p.s. 
produces recognition of an alveolar plosive, in the range 300-700 c.p.s. of a bi-labial 
plosive and in the range 700-3000 c.p.s. the recognition is very much affected by the 
vowel formants with which the burst is combined. In this range, bursts combined with 
open vowels are generally recognized as velar consonants, but when combined with 
close vowels, as bi-labial consonants. 

The present repertory of the recognizer includes only /t/ and /k/. Recognition of 
these sounds on a spectral basis is equivalent to detecting the frequency of the burst 
and is fairly successful in the case of /t/ (82% correct recognitions) but rather less 
so in the case of /k/ (23%). In order to recognize plosives at all, it has been found 
necessary to establish that a burst as distinct from a continuous noise has occurred, 
that is to say, the machine makes use of a duration cue as well as a spectral one. For 
/t/ and /k/, the frequency information used is yielded, not by a pair of filters, but 
by a single filter in each case. The mid-point of the filter band for /t/ is about 4000 
c.p.s. and for /k/ about 1600 c.p.s. When there is a peak of energy in the 4000 c.p.s. 
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filter, this fact is registered and a timing circuit inspects the output of this filter for 
125 msecs. If the output from the filter falls to a low level during this period, then 
the energy peak was due to a burst of noise and the sound is acoustically recognized 
as /t/. If, however, the level is still high at the end of this period, then the /t/ 
recognizer circuit is inhibited and recognition proceeds on the basis of filter output 
products, and the sound is most probably recognized as /s/. This is an example of 
the use in the mechanical recognizer of combined physical cues, and it parallels the 
human listener’s procedure in the same circumstances. 


In the recognizer, once it is established that a burst has occurred, discrimination 
between /t/ and /k/ proceeds on the basis of the burst frequency. With the filters 
selected for this purpose, /t/ is recognized if the burst is within the range 3500-4500 
c.p.s., and /k/ if it is in the range 1400-1800 c.p.s. approximately. From the Haskins 
Laboratories’ experimental results, and assuming that the latter reflect reasonably well 
the state of affairs in live speech, we may make some predictions as to the success 
of recognition with such an arrangement. The vowels recognized by the machine are 
/i/, /a/, /u/ and /2/, and in order to recognize /t/ and /k/ with complete success 
in combination with all these vowels, we should need to allow for variation in the 
burst frequency over a range wider than one filter band-width. When a burst is 
recognized as /t/ only if it falls in the range 3500-4500 c.p.s. we should expect most 
successful recognition before the vowel /i/, good recognition before /a/ and probably 
poor recognition with /u/. The central vowel /2/ was not used in the Haskins 
experiments, but in view of its formant structure, we might expect it to be between 
/a/ and /u/. Experimental results from the recognizer follow this predicted pattern 
with some fidelity. /t/ is recognized with an accuracy of 100% when it precedes /i/, 
and of 70% before /a/. When it precedes /u/, however, the score is only 25% 
correct recognitions and before /2/, 47%. In the case of /k/, prediction is more 
difficult because the absence of /p/ from the repertory is likely to have a greater effect 
on /k/ recognitions than on /t/. It seems likely however that if bursts in the range 
1400-1800 c.p.s. are recognized as /k/, then most correct recognitions will probably 
occur before /a/, rather less before /u/ and /2/, and least before /i/. In fact the 
experimental results show 71% correct recognitions of /k/ before /u/, 60% before 
/a/, 50% before /a/ and only 20% before /i/. 


It seems likely that the combination of spectral information with time information 
will provide an adequate means of recognizing voiceless plosive consonants as a class, 
and of distinguishing between bi-labial, alveolar and velar plosives. There remains 
the difficult problem, which arises also in the case of fricatives, of distinguishing 
between fortis and lenis sounds, the so-called voiceless and voiced pairs /p - b, t -d, 
k-g/. In current English speech, /b, d and g/ are not voiced when they occur at 
the beginning or end of a group and even medially the voicing is relatively weak. It 
seems clear therefore that an attempt to distinguish, for example, between /t/ and 
/d/ by detecting periodicity during the stop of /d/, even if this could be done success- 
fully, would be useless for many occurrences of the sounds. The Haskins experiments 
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on the recognition of plosives have included many in which listeners have been asked 
to recognize /b, d and g/, but they have not so far been designed to discover the 
particular cues used by the listener in distinguishing between /p/ and /b/, /t/ end 
/d/, /k/ and /g/. This discrimination has not yet been attempted in the mechanical 
recognizer but it may be useful to discuss ways in which it might be effected. 


It has been found that an important cue for the recognition of many consonants 
(including the plosives) is provided by the variation of formant frequency with time 
during the “ vowel” stretches of the speech waves. These formant changes, usually 
referred to as transitions, are evident both when the consonant precedes and when it 
follows the vowel and they are powerful cues in the case of plosive consonants 
(Delattre, 1955) ; all plosives are accompanied by a rising transition in formant 1 ; 
a rising transition in formant 2 accompanies a bi-labial plosive, no matter what the 
vowel is, but in the case of alveolar and velar plosives, the type of transition is 
dependent on the vowel with which the plosive is combined. In close vowels, a 
falling transition is generally associated with velar plosives, and in more open vowels, 
with alveolar plosives. The latter may be marked by an absence of second formant 
transitions and in general produce a less sharply falling transition than do the velar 
plosives. The importance of the transitions becomes all the greater when we consider 
that in many cases other cues, such as a burst of noise, may be weak or entirely 
missing. In the lenis plosives, for example, the burst is very weak and in unreleased 
plosives is missing altogether. In these cases it may be necessary to use formant 
transition as the basis for this distinction. The detection of formant transitions 
provides one possible method of discriminating between fortis and lenis plosives. The 
rising transition of formant 1, found in all plosives, is generally less sharp and less 
extensive in the case of /p, t and k/ than in /b, d and g/, and it is possible that the 
registering of this difference might enable the mechanical recognizer to make this 
difficult discrimination. There are several techniques by which this type of information 
could be made use of ; the formant tracking devices already in use in analysis- 
synthesis systems might, for example, be adapted for this purpose. 


Further cues for this distinction may be afforded by the time pattern of events 
in the utterance. We should expect that the burst in a fortis plosive would be followed 
by a greater time interval before the onset of the vowel formants than would occur 
in a lenis plosive, and this is confirmed by the experience of the Haskins Laboratories. 
Measurement of this time might provide a means of distinguishing /p/ from /b/ etc., 
when the plosives occur before a vowel. Another time factor would also be of use in 
recognizing post-vocalic plosives and that is the ratio of duration of a vowel and a 
following consonant. Experimental work on this line has been done (Denes, 1955) 
with fricative consonants and a fuller consideration of its possible use will be under- 
taken in the section on fricatives. 

Altogether the problem of discriminating between fortis and lenis plosives in 
acoustic recognition seems to be a particularly difficult one. The cues suggested could 
certainly be used but would be needed in combination and would entail great 
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complexity in the machine. This is undoubtedly one of the cases in which use could 
be made of the reader’s linguistic knowledge and one would probably be content to 
disregard the fortis/lenis distinction, that is to print p for either /p/ or /b/ and so on. 


THE RECOGNITION OF FRICATIVE CONSONANTS 


The fricative class of consonants presents on the whole a somewhat easier task for 
acoustic recognition than the plosives, as many of them provide an acoustic input of 
reasonable intensity lasting for a longer period. Experiments on the recognition of 
fricatives by listeners (Harris, 1958, Hughes and Halle, 1956) show the spectral 
distribution of energy to be an important cue. All these sounds (/f, v, 6, 5, s, z, 
{, 3, h/) have a component which can be thought of as a continuous spectrum noise 
occupying a specific part of the frequency range. Discrimination between fricatives 
made at different points of articulation (labio-dental, dental, alveolar and glottal) 
depends in part on the lower frequency limit of the band of noise. If this limit is in 
the region of 3600 c.p.s., then the fricative is recognized as /s/ ; if it is at about 
2000 c.p.s., as /§/, and at 1200 c.p.s., as /f/. The cut-off frequency of the friction 
noise in /0/ is less determinate and does not seem to be as important a cue for this 
fricative. The quality of /h/ is very much dependent on the following vowel and its 
recognition is governed more by the formants of the friction than by its cut-off 
frequency. It may therefore be necessary to use some of the other cues that have been 
suggested by the Haskins experiments in order to make all the discriminations 
required in the fricative class. 

The first of these is the intensity of the friction noise compared with that of an 
accompanying vowel. Both /s/ and /{§/ show a high intensity level whereas /f/, 
/8/ and /h/ are of low intensity. It would be possible by the use of an intensity 
cue to divide these sounds into two groups, the alveolars on one hand and the dentals 
and glottals on the other, and by the subsequent use of the spectral cue to discriminate 
between the members of one group. A third cue that may be used as an aid to 
identification is the nature of the second formant transition (Harris, 1958). It may be 
necessary to have recourse to this for identifying the dental fricative /6/ which will 
remain probably the most difficult to identify because the friction noise is weak and 
has no very definite cut-off frequency. 

In its present form the recognizer does not deal with all the fricatives, but it does 
discriminate between /f, s and §/. In dealing with them the recognizer uses two types 
of cue discussed above : noise spectrum and intensity. /‘/ is recognized on spectral 
information alone whilst /s/ and /f/ are distinguished from /§/ on a spectral basis 
and from each other by using the intensity criterion. The band used for /{/ is 
2,000-3,000 c.p.s. and for /f/ and /s/ 5,000-8,000 c.p.s. The correct recognitions 
achieved in this way are : 63% for /f/, 79% for /s/ and 94% for /§/. 

We have so far dealt only with the fortis fricatives as being representative of the 
fricative group and there remains, as in the case of plosives, the question of 
distinguishing fortis from lenis consonants. There seem to be at least four different 
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acoustic cues for making this distinction: (1) the presence of a low frequency 
component due to voicing, (2) duration, (3) a formant transition, and (4) intensity. At 
present only the first two of these are used in the recognizer which deals with only one 
fortis-lenis opposition, /s - z/. 

Although /z/ is fully voiced only between voiced sounds, it is possible to identify 
at least some of its initial occurrences by detecting the presence of a low frequency 
component. This has been done by the paired filter technique, one of the pair being 
the higher of the /s/ filters and the second a low-frequency filter which detects the 
larynx tone. The drawback of this method is that if we try to detect the larynx tone 
in final /z/ the low frequency filter also gives an output every time there is a voiced 
sound, vowel or consonant, and the voltage product from the /z/ pair of filters often 
has a high value. As a consequence, the recognizer has a tendency to print z a great 
many more times than the phoneme occurs, and hence the identification of final /z/ 
by this means is not very satisfactory. For this reason, another cue had to be found 
for making the fortis-lenis distinction in final position. Previous experiments by one 
of us (Denes, 1955) have shown that the duration ratio of a fricative to a preceding 
vowel is a powerful cue in discrimination between fortis and lenis consonants. The 
friction of /z/ is invariably short compared with that of /s/ and this time difference 
is used in the recognizer. It will be remembered that a discrimination based on the 
duration of energy in a high frequency filter has already been used to distinguish /t/ 
from /s/. A third value of critical duration has now been added: a short duration, 
125 msecs. for identifying /t/, an intermediate duration, 250 msecs. for /z/ and any 
greater value of duration for /s/. This has proved quite successful, giving a correct 
recognition score of 66% for final /z/. 


The other lenis fricatives are not at present dealt with in the machine. From the 
study of analytical data it is clear that the recognition of these sounds may be difficult. 
The total intensity of /v/ and /3/ is low, even lower than that of /f/ and /6/, which 
are themselves the weakest members of the fricative group. It may call for a good 
deal of ingenuity to recognize /v/ and /3/ correctly and it is therefore worth noting 
that the fortis-lenis distinction for the dental and alveolar fricatives is rarely important 
in English. The obvious solution for /6/ and /3/ is to print th for both and thus to 
conform with orthography. The number of words requiring the /f - v/ distinction 
initially is only about 25, so that a negligible number of errors would be produced 
if f were always printed initially for labio-dental fricatives. 


The affricates have not yet been included in the repertory of the recognizer and not 
much can be offered in the way of theoretical treatment of the problems that they 
may pose. They are essentially stop consonants followed by a long frictional release. 
The machine in its present state will not be able to recognize affricates as a plosive 
followed by a fricative because it recognizes plosives by detecting the release and not 
the stop. It will therefore ignore the stop and recognize a fricative only. Two methods 
suggest themselves for recognizing affricates: by detecting the absence of sound 
energy for a short time followed by a relatively long period of friction, or by measuring 
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the rate of onset of the frictional energy in order to distinguish these sounds from 
fricatives. 


THE RECOGNITION OF PERIODIC CONTINUANTS 


The last major class of phonemes that has to be dealt with is that of the periodic 
continuants (/m, n, 9, 1, r, j and w/). /r, j and w/ are not continuant in the phonetic 
sense, but it is convenient from the physical point of view to include them in the same 
class as the nasals and laterals. All sounds in this group are vowel-like in character 
and their recognition depends to some considerable extent on formant structure. 
Experimental results indicate that the three nasal consonants, /m, n and n/, give rise 
to formants whose frequency varies very little from consonant to consonant. The main 
effect of these formants is to produce an impression of nasality and they contribute 
relatively little to discrimination between bi-labial, alveolar and velar articulaticn 
(Liberman, 1954, Malécot, 1956). In recognizing these consonants, therefore, we may 
expect to detect nasality by the choice of an appropriate filter band or bands but this 
method is not likely to give very accurate discrimination between the various nasal 
consonants. /m/, like the other bi-labial sounds, produces an extensive minus 
transition in all contexts and could be identified by a combination of the nasal 
frequency cue with the transition cue. /n/ gives rise to a transition which may be 
either plus or minus, depending on the associated vowel, and it would therefore be 
more difficult to use this cue in the case of /n/. The transition, whether plus or minus, 
is however less extensive for this sound than for the other nasals, and this might prove 
a possible basis for recognition. The velar nasal /n/ does not occur initially and in 
other positions is marked by a sharp plus transition which would be comparatively 
easy to detect. 

The repertory of the recognizer at present includes only /m/ and /n/ and the 
filter-pairs are employed in detecting the low formant 1 associated with these 
consonants. By this means, the machine decides quite successfully when a nasal 
consonant has occurred but there is a strong tendency to confuse initial /1/ with 
the nasals. Distinction between /m/ and /n/ by the present methods is not possible. 
It is an interesting feature of our results up to date that the machine distinguishes 
more successfully between initial and final nasals than between /m/ and /n/. In 
the case of /l/ this acoustic difference between pre-and post-vocalic sounds is well 
recognized in the traditional classification of /1/ sounds as “clear” and “dark”. It 
will be interesting to see whether later experiments confirm a similar difference 
between pre- and post-vocalic nasals. The scores obtained are 86% correct for initial 
/\, m, n/ as a group, 94% for final /m, n/ and 31% for final /1/. 

These results are in accord with the conclusions reached in experiments on the 
recognition of nasal consonants already referred to and suggest that the necessary 
discriminations may not be possible without an inspection of the formant transitions 
of the neighbouring vowel section. 

The machine repertory does not yet include /r/, /j/ or /w/, but it is convenient 
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to consider these consonants together, since recognition experiments (O’Connor, 1957) 
indicate that they may present allied problems. In /r/, recognition appears to depend 
partly on the presence of a fairly strong third formant. This formant is, of course, 
simply a maximum in the energy distribution curve, and if the listener is depending 
on this peak as a cue, then we should be able to achieve good recognition by pairing 
filters in the right way and using a filter which is likely to detect this particular 
maximum. If, however, the listener is using a more complex cue, such as the 
occurrence of all three maxima, or the frequency relation of the three formants, then 
there might be a case for using sets of three filters instead of two. It is evident 
that we could not adopt a three-filter pattern for just one or two sounds, since 
recognition depends on the products of filter outputs, and such a step would disrupt 
the relations between the sounds in the system. It is not the principal aim of our 
experiments to improve “ primary ” or acoustic recognition but rather to explore the 
extent to which statistical information can be used in a mechanical recognizer, and 
we have not therefore attempted acoustic recognition with more than two filters for 
each sound. It is probable that the use of three-filter patterns for all sounds would 
improve acoustic recognition and it might help to solve the problem of discrimination 
in the periodic continuant group, in particular. Distinction between the various nasals 
might be more successful, between /i/ and the nasals, and between /1/ and /r’. 
The last distinction would almost certainly require the use of formant transition 
information since the experiments referred to have shown that /r/ requires a slow 
minus transition in the third formant, whereas the third formant for /l/ remains 
steady. Identification of /j/ or /w/ would depend on the same principle, as these 
sounds are produced by a move through the articulation of /i/ and of /u/ respectively, 
and hence they consist of relatively slow transitions to and from the accompanying 
vowels. 

This section on the problems of recognizing mechanically specific groups of 
sounds can be summarized by stating that a considerable measure of success can be 
achieved by the adoption of the simple type of primary or acoustic recognition at 
present employed in the mechanical recognizer. In other words, the identification of 
sounds on the basis of formants or peaks in the spectral distribution of energy is a 
good working method. If however we wish to improve the acoustic recognition still 
further, we are compelled to adopt the method of the human listener and work with 
combinations of cues. In many cases, very complex arrangements of cues will be 
needed and their use would require a high degree of flexibility in the machine. 


THE PROBLEM OF DIFFERENCES BETWEEN SPEAKERS 


Any success so far achieved in mechanical recognition has been in dealing with the 
speech of one individual speaker, and in this sense, the machine is very far from 
meeting the needs of a practical system. The problems raised by facing the recognizer 
with the speech of different speakers are not, however, different from those we have 
already considered. Such a step will introduce a greater degree of variability in the 
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input but will not involve variation in fresh dimensions. One method of dealing with 
this greater variability is to make the acoustic recognizer self-adjusting to an input 
from a fresh speaker. To take the case of the existing mechanical recognizer, let us 
suppose that the machine is adjusted for the speech of speaker X, that is to say. 
appropriate filter pairs have been selected to give the greatest success in the 
identification of sounds in his speech. If speaker Y now talks into the recognizer, with 
no re-adjustment of the machine, the acoustic recognition of sounds is much less 
successful ; the order of change in the present machine is from 75% of correct 
identifications with speaker X to 50% with speaker Y. It has been found that the 
additional errors are often due to the fact that the maxima of energy for speaker Y occur 
in filters adjacent to those selected for speaker X and the necessary changes in the 
selected filter pairs will now improve recognition for speaker Y. It is quite feasible 
to carry out such an adjustment automatically in the recognizer. The machine would 
adjust itself on the basis of a standard speech sequence, preferably containing all the 
sounds in its repertory. The speaker repeats the standard sequence and in the 
“ setting up ” condition, the recognizer “ knows ” what the required solution is. This 
knowledge is fed back to the acoustic recognizer and the selection of filter pairs is 
adjusted until the operation of the acoustic recognizer provides the “ right ” answer. 
This arrangement of filter pairs is then maintained for subsequent input from this 
speaker. Such a process is analogous to what happens in the human listener. The 
latter, when he first hears an unfamiliar speaker, or any speaker in unfamiliar 
conditions, adjusts his recognition process in order to abstract the proper system of 
relations from the new acoustic input. He does not require the speaker to repeat a 
test sentence because he has enough information from contexts of all kinds to tell him 
what is the correct solution for any given speech sequence. 


THE USE OF LANGUAGE STATISTICS IN THE MACHINE 


When all such measures have been taken to deal with the great variability in speech 
inputs to a mechanical recognizer, there is no doubt that the recognizer output will 
still evince a large number of errors. The thesis underlying our experimental work 
from its inception has been that the number of errors can be reduced to a manageable 
level only by simulating another stage of recognition as it proceeds in the human 
listener. This second stage consists in applying linguistic knowledge to the results 
of primary recognition and it will be useful first to see how this process works. 

As we have seen, the first operation in the listener’s reception of speech is the 
segmenting of the continous physical input and the assigning of successive segments 
to particular phonemes. This primary recognition is based on a variety of physical 
cues, and when these are translated into psychological dimensions, we see that 
recognition depends very largely on the perception of quality differences, with 
assistance from length, pitch and loudness discriminations. The important feature 
of primary recognition is that linguistic information is already being used by the 
listener since every unit recognized must belong to the system of the message 
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language. In decoding an English message, the listener forces each succeeding unit 
at the phonemic level into one of the forty-odd English phonemes. The inclusive 
nature of mechanical recognition hence parallels the working of the human listener, 
although the latter retains the freedom to decide that the message is not in a language 
he understands. 

In making phonemic identifications the listener not only confines himself to a 
phonemic system, but at each position in the sequence he is strongly influenced by 
sequential probabilities. The process of learning a language consists in part of 
acquiring knowledge of sequential probabilities at all levels, and Shannon (1950) has 
demonstrated the existence and the force of this knowledge in a reader of English. 
His technique has been used (Fry, 1955) to show a similar effect at the phonemic 
level in speech. The recognition of every succeeding phoneme is affected therefore 
by the listener’s knowledge of which phoneme is most likely to follow the one already 
identified, and which phonemes are impossible in that position. His choice is very 
much constrained by these considerations and it is owing to this high redundancy in 
speech that the listener rarely makes an error in phoneme identification in face of 
the great variability of the speech sounds that he hears from many different speakers 
and in varying listening conditions. 

Phoneme recognition can be thought of as a scanning process in which incoming 
patterns are compared with stored patterns in the brain, but in which a decision rests 
not simply upon closeness of fit between the incoming and the stored pattern. 
Statistical knowledge in the brain determines what range of stored patterns shall be 
scanned at any stage in the speech sequence and resolves many of the uncertainties 
that result from matching on a purely acoustic basis. The rest of the process of 
decoding a spoken message is a serial scanning operation carried out on the same 
lines, but with units of greater magnitude. The phoneme sequence resulting from the 
first scanning is being continually inspected at the next level in order to identify 
morphemes, and here again statistical knowledge constrains the choice. At succeeding 
levels, morpheme sequences are recognized as words and word sequences as sentences. 
The decoding process may then be considered to be complete. 

The two most important features of this system are the continual use of statistical 
information at all levels and the feeding-back of information from one level to another. 
The decoding of spoken messages by the listener is relatively free of error because 
of these two factors. On the phonemic level, for example, there may be positions in 
a sequence where an error occurs even after the application of statistical knowledge to 
the results of primary recognition. In a sequence beginning /let im wei... / the 
ensuing segment might be identified as /k/ and there would be nothing to forbid this 
from the point of view of probability. At this point, decisions have also been taken 
at morpheme and word levels and these too do not contradict the recognition of /k/. 
Suppose, however, that there follows a phoneme sequence /til/, this gives rise to a 
further morpheme identification and there is immediate uncertainty because the 
probability of the morpheme transition wake till, in this sequence, is zero. If this 
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is followed by /tomorou/, the uncertainty is resolved and information is fed back 
that leads to the replacing of /k/ by /t/ in the phoneme sequence. This can, of course, 
be stated simply as a correction of error at the word level by replacing wake by wait 
but it is the common experience of listeners that they are aware of these corrections 
at the phonemic level. In clearing up a misunderstanding one may well say, “I 
thought you said /weik/ not /weit/ ”, with a deliberate effort to stress the phonemic 
opposition. The fact that such errors are rare is due to the feeding-back of information 
from level to level and the consequent reduction in the number of units that must be 
scanned to arrive at any one identification. 

The importance of this application of statistical knowledge of the language in the 
reception of speech can hardly be over-estimated. The resistance of speech com- 
munication to noise and distortion is largely a measure of this factor and of the degiee 
to which the listener’s @ priori knowledge is capable of counteracting variations in 
the acoustic speech input. Primary recognition provides for the listener a scaffolding, 
as it were, upon which he can construct probable sequences of linguistic units. A 
mechanical recognizer which could simulate this action with a high degree of fidelity 
would undoubtedly work successfully even if its acoustic recognition were far from 
perfect. The nearer the machine came to imitating the inter-related levels of operation 
that have just been outlined, the nearer would it approach the flexibility and certainty 
of the human recognition apparatus, with its ability to function with virtually no 
errors in the face of widely varying speech inputs. On the other hand, a machine 
working on acoustic recognition alone is unlikely to reach a level of accuracy, even 
in dealing with the speech of a single speaker, that would make it of real practical 
interest. 

To reproduce the linguistic mechanism of the listener to any considerable extent, 
we can say at once is not very practicable. We have seen already that the storage 
problem is serious if we adopt words or sentences as our recognition units and it would 
be even more difficult to deal with the input on several different levels, with interaction 
between them. The question is therefore whether any of the advantages of statistical 
effects can be obtained without the use of prohibitively large stores and compiex 
inter-connections. By considering all input sequences as successions of phonemes 
only, we shall be sacrificing the long-range influence of morpheme, word and sentence 
structure, 

Suppose that the recognizer is provided with a knowledge of phoneme sequential 
probabilities up to m places. If it were operating on a purely statistical basis, it 
would clearly put out a number of sequences which do not form English words, as 
we see from Shannon’s (1948) approximations to printed English. If, however, these 
sequences were then scanned and matched with a repertory of word forms, for 
example, they would all be identified as English words, with a consequent improvenient 
in the accuracy of recognition, no matter whether this were reckoned on a word or 
a phoneme basis. Such an arrangement is not practically very feasible but it has 
seemed worth while to expiore the possible uses of a much simpler system confined to 
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sequential probabilities at the phoneme level. Provided that knowledge of probabilities 
could be extended to a number of places comparable with the mean word-length, we 
might expect this method to give reasonably satisfactory results. 

In the experimental mechanical recognizer, only the first step has so far been taken 
in the exploration of this problem. The machine combines with the results of acoustic 
recognition, information concerning phonemic sequential probabilities to two places. 
When a given phoneme has been recognized, for example, as /s/, the recognizer has 
available knowledge as to the probability that any other phoneme will follow /s/, 
and this knowlege is used in making the next decision. This statistical information 
is obtained by counting the phoneme digram frequencies in the speech material that 
constitutes the input to the machine. Since the phoneme repertory of the recognizer 
is limited, the words used in operating it are also limited to a selection of English 
words made up exclusively of the sounds within the machine repertory, and it is, in 
fact, dealing with an artificial language. The digram frequencies for this artificial 
language make up the statistical knowledge of the machine, and this knowledge is 
stored in the form of an nm’ matrix of potentiometers, where n is the number of 
phonemes in the recognizer (at present 14, including the space). The settings of 
these potentiometers carry the information as to sequential probability for sets of two 
phonemes, each column of the matrix being associated with the first phoneme and 
each row with the second phoneme in a set. Thus the /s/ column “ knows” the 
relative probability that each of the 14 phonemes will follow /s/ ; the row associated 
with /s/ “ knows ” how likely it is that /s/ will follow any of the 14 phonemes. This 
knowledge is made use of in the following way. Suppose that the last letter printed 
by the recognizer represents /s/. A memory unit holds this information while the 
machine is proceeding to the next recognition. In the probability matrix, an 
instruction derived from the memory unit switches a common voltage to all the 
potentiometers in the /s/ column. The output of this column is a set of voltages whose 
magnitude is proportional to the probability that any given phoneme will follow /s/. 
Meanwhile the acoustic recognizer is now supplying 14 voltages representing the 
output products of the filter pairs. The output product of each filter pair is now 
combined with the appropriate probability voltage, that is, the filter pair product for 
/s/ is multiplied by the probability voltage for /s/, the filter pair product for /i/ 
by the probability voltage for /i/ and so on throughout the repertory. The maximum 
detector now selects the greatest of these products and the machine decides to print 
the symbol associated with it. If, in our example, the selected symbol were /i/, the 
memory unit would register this fact and in the next recognition operation the matrix 
column associated with /i/ would be energized. Thus at each position in the phoneme 
sequence the machine’s decision is based on a combination of acoustic and statistical 
information (see Fig. 2 for a schematic diagram). 

Before going on to discuss the results obtained by this method, it may be worth 
while to point out that with the present arrangement, acoustic and statistical 
information have equal weight in determining the result of a given operation. 
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Fig. 2. Schematic diagram showing the arrangement by which statistical informauon is 
combined with acoustic information in the mechanical speech recognizer. 


result depends upon rank-ordering the final products of acoustic and probability 
voltages (although in practice only one, the greatest, of these produces a result) and 
this rank order is determined by the relations of the voltages in the acoustic set and 
the relations of the voltages in the probability set. These proportions are unaffected 
by any change in the absolute level of, for example, the common voltage supplied to 
the probability matrix. This voltage could be halved or doubled without affecting the 
rank order of the final products and without altering the force of the probability 
voltage in deciding this order. It would, on the other hand, be possible to change 
the weighting of the acoustic or the probability factor by changing the proportions 
within one set in some simple fashion. If in the final product, for example, the 
acoustic voltage were still directly proportional to the filter outputs, whilst the 
probability voltage were proportional to the square of the probability, the statistical 
information would then have more weight than the acoustic information in the 
operation of the machine. 

Experiments have been made so far with the comparatively simple kind of statistical 
biasing described above and they have been designed to discover whether such a 
technique is practicable and would make any considerable difference to the success 
of the recognizer. It must be emphasized that the machine is dealing with a restricted 
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repertory of phonemes, a restricted vocabulary of words and with the speech of a 
single speaker The statistics used are those of this artificial language, as we have 
said, and the recognizer is being employed here as a computer to calculate the results 
of interaction between this statistical information and acoustic recognition. The 
machine is arranged to operate either on the basis of acoustic recognition alone 
(referred to as the unbiased condition) or of acoustic plus probability information 
(the biased condition). In the unbiased condition the sound articulation score is 60% 
and the word score 24%, whilst in the biased condition the corresponding figures are 
72% and 44%". The influence of statistical information is very marked here ; 
phoneme recognition is improved by the biasing and there is a consequent increase in 
the number of words correctly recognized of almost 100%. 


One feature of this biasing system must be mentioned, since it is connected with a 
general problem in mechanical recognition. As soon as any use is made of sequential 
probability, there arises the problem of sequential errors. At a certain position in the 
phoneme sequence, the machine makes an error, and thereupon the probabilities set 
up for the ensuing operation are also wrong. In these circumstances the use of statistical 
information may lead to additional errors. It is clear that a certain level of accuracy 
in acoustic recognition is necessary if the use of a probability factor is not to lead 
to an increase rather than a decrease in errors, and further, it may be inadvisable to 
change the relative weighting of the acoustic and probability factors, in the marner 
suggested above, for similar reasons, though this is a question which can be settled 
only by experiment. Sequential errors occur in the case of the human listener. They 
form the basis of the parlour game in which a whispered message is started at one 
point in a circie of people and is transmitted with the object of seeing what semantic 
distortions will have taken place by the time the message has come full circle ; they are 
well exemplified, too, in the classic case of the signaller who passed the verbal message 
“ Send three and fourpence, I’m going to a dance” in place of the original, “ Send 
reinforcements I’m going to advance”. The listener, once embarked on the wrong 
set of constraints in a particular sequence, may add one error to another for quite a 
long stretch. That this problem is not at present a serious one in the mechanical 
recognizer is due to the artificial conditions in which the machine is working. Speech 
is fed into the recognizer with a gap after every word, and the decision to recognize a 
space is taken on a purely acoustic basis. When a certain time elapses with no acoustic 
input to the system, this fact is registered by the time-measuring circuit and the 
machine records a space. The period of silence required to operate the space circuit 
is pre-determined and has to be set to such a value that the stop of a plosive consonant 
will not operate it but that it will not disregard spaces in the speech input. These 
conditions can be met without much difficulty and as a result we have in the space- 
detector a section of the acoustic recognizer that works with an accuracy of virtually 
100%. This fact is of cardinal importance for the working of the statistical biasing in 


1 These scores are taken from Table III of an earlier paper (1957). 
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the machine. The space figures as a unit in the probability matrix but it influences only 
the recognition of phonemes following space, and is not itself influenced by phonemes 
preceding space. When a space has been recorded, the probabilities set up are those 
for each phoneme in word-initial position, and since space is recognized with complete 
accuracy, the sequential probabilities are correct at all these points in the sequence. 
Hence the frequent occurrences of spaces in the present working of the recognizer 
is a powerful check on the occurrence of sequential errors. 

Future extensions of work with the mechanical recognizer will involve additions to 
the phoneme repertory with consequent increases in the word vocabulary and in the 
variety of phoneme sequences, particularly in consonant clusters. At each stage the 
problem of sequential errors is likely to become more important and the machine 
should serve with increasing usefulness as a computer in assessing the effect of adding 
to the acoustic and the statistical knowledge available. If we assume that 40 recognizer 
circuits would be enough to deal with the whole phoneme repertory of English, then 
the use of digram probabilities calls for a total of 1600 biasing circuits. To extend 
the statistical knowledge to trigram or tetragram frequencies would involve a very 
large, though not impossibly large, number of identical circuits (64,000 for trigram 
frequencies). As we have said, extending the scope of the statistical information 
should in theory increase the certainty of recognition, but we have seen that there 
will be attendant drawbacks, and it would be unwise to embark on a large scale 
operation until all these aspects of the problem have been explored. Such exploration 
is the present task of our experimental work in mechanical recognition. 


THE ROLE OF THE “ READER ” 


The present paper has made little reference to the role of the human “ reader ” 
who must at some stage deal with the output of the machine. This problem has been 
treated in a previous paper (Fry and Denes 1957) where we have stressed the fact 
that any linguistic knowledge utilised in the machine will always be supplemented 
by the reader’s own linguistic knowledge. If the output of the machine is presented 
in a familiar form to the reader his linguistic knowledge can be exploited fully. The 
less familiar the form in which the output is presented, for example through the use 
of a phonemic transcription instead of orthography or through the occurrence of many 
errors on the part of the machine, the less the contribution made by the reader’s 
linguistic knowledge. Learning on the part of the reader will increase his familiarity 
with the form of presentation and increase the use he can make of his linguistic 
knowledge. It can be said that the amount of information built into the machine can 
be traded for learning or for linguistic information in the reader. One of the practical 
applications of the recognizer may well be to situations where a restricted ensemble of 
messages is in use. This would increase the value of the reader’s linguistic information 
and thus reduce the amount of learning required. 

These last considerations provide a further argument for the view that the 
performance of the reader remains the final criterion of the success of the recognizer, 
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and solutions to the problems we have outlined here must in the end be judged by 
their value in simplifying the task of the man who has to use the machine. 
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SPEECH ANALYSIS AND MENTAL PROCESSES 


FRIEDA GOLDMAN-EISLER 
University College, London 


The contribution of speech analysis to the scientific study of the processes involved in 
psychotherapy is reviewed. Objective measures of the mental processes underlying 
the act of speech resulting from speech analysis are reported, the problem of linking 
these to physiological states discussed, and examples presented of such relationships. 

The analysis of the rate and nature of speech production in particular is shown to 
have led to the isolation of speech parameters which are relevant to the investigation of 
speech as a process reflecting affective and cognitive experiences, 


INTRODUCTION 


The scientific study of language and speech phenomena has in recent years received 
impetus from the needs of several disciplines. In this paper I propose to consider the 
relevance of the analysis of the speech processes to psychiatry, and in particular to 
psychotherapy. 

By creating the method of psychotherapy Freud has raised the verbal activity of 
patients to the status of an aim in itself rather than a means by which doctors obtain 
information. The basic tenet of this technique is that talking out has beneficial effects 
in releasing tension, bringing relief to anxiety and removing “ psychogenic ” physical 
symptoms. Freud’s original formulation of this assumption stresses that verbal 
expression must be given to affect. Implicit in the assumption is the idea that there 
is a continuum of processes linking language and speech to the psycho-physiological 
states of the organism ; the recent development of psychosomatic medicine is a 
logical outcome. 

This assumption has, however, never been verified except perhaps on a basis of 
post hoc ergo propter hoc. We do not know how, that is to say, by what chain of 
processes the effect is achieved. The psychiatric interview as a therapeutic techniaue 
has therefore remained a matter of controversy and in spite of its wide acceptance in 
psychiatric practice its protagonists are ill-equipped to meet the challenges with which 
they have recently been faced (Eysenck, 1952). The case against the effectiveness of 
psychotherapy, however, apart from being inconclusive, has remained at the level of 
Statistical and non-specific evaluation and adds nothing to our knowledge of the 
actual processes involved. Statistical evidence was cited that cures after professional 
psychotherapy have not been more frequent than improvements when contact was 
maintained with practitioners only. As, however, there can be no doubt that relief 
after talking out has been obtained, this effect must be considered a most 
important fact in psychology irrespective of how frequently and under what title it may 
be achieved. The question for the science of psychology is not the practical 
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one, of how curative in a statistical sense the method of psychotherapy is. The 
scientific question posed by the phenomenon of relief brought to states of anxiety, 
tension and to physical symptoms through talking is how such an effect can come abcut 
at all ; what chain of events leads from the act of verbalisation to the physiological 
transformations in the organism which are relevant to the subjective states of anxiety 
or tension. Even assuming that this is a phenomenon occurring only under certain 
conditions, scientific curiosity is challenged to disentangle and lay bare the threads 
between the two events, for bridging the gap between the symbolic process of speech 
and the physiological activities of the organism amounts to tracing in detail the psycho- 
physical connection. A greater understanding of the mechanism underlying such 
techniques as the psychiatric interview might result in the better control of its effects 
and a more selective and discriminating application of the psycho-therapeutic method. 
The understanding of the symbolic and expressive phenomena involved in speech 
might also throw light upon the kind of effect achieved through biochemical and 
physical methods of treatment. 

There hardly exists a better entry to the solution of this problem than the analysis 
of the speech process. Let us consider the reasons for this. 

(1) Of all forms of behaviour the formal aspects of speech are most easily divided 
into units and measured objectively. 

(2) The act of speech is a meeting-ground for functions and activities of the 
organism (mental and physiological) at all levels. Speech production is achieved 
through the co-ordination of muscular, respiratory and neural activities on the one 
hand and of cultural, intellectual and emotional forces on the other. 

While verbal concepts represent the tendency towards rationalizing and objectifying 
the world of experience they carry with them meaning derived from a variety of 
sources, unconscious, developmental, historico-cultural. In the contrary sense, the 
activity of speech involving the action of the organism at all levels must continuously 
bear both upon expression (a kind of positive feedback effect) and, under the impact 
of linguistic structure and sequential dependencies must also determine the choice of 
words. Utterance is not only an end product of the speech process but also a 
determinant in the construction of meaning and thinking itself. 

Thus central processes not only initiate, direct and act as selectors of speech 
activity, but are themselves affected by the autonomic action involved in verbal 
expression and the linguistic constraints affecting verbal sequence. We must in fact 
conceive of the central and autonomic, voluntary and automatic processes as 
continuously inter-acting and alternating as soon as speech is uttered. 

(3) Speech is a process by which the complex web of intra- and inter-organismic 
relations is externalised. By virtue of this, it is, not only as Fournié wrote in 1887 
“the only window through which the physiologist can view the cerebral life ” (cited 
in Lashley, 1951), but also the only window through which the psychologist may view 
the dynamic patterning woven of motivating, controlling and environmental forces. 

(4) This can be done only by learning to identify the various levels of the organism’s 
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activity in the different levels and elements of language and speech. To this end it 
is necessary to analyse the speech process into measurable units such as might prove 
efficient indicators of these psycho-physiological levels. 


SPEECH ANALYSIS 


Confronted with the wealth of linguistic, vocal and expressive phenomena that make 
up the totality of speech, the investigator who undertakes to cut a path through this 
field may easily feel paralysed by the demands of the task on his powers of choice and 
decision. With the ideal of scientific rigour before him he will, perhaps regretfully, 
but without much hesitation bypass the content of speech, though he may leave this 
area of facts for a later reference or cross-checking purposes. He will then be left 
with the formal aspects of speech, the facts which belong to the manner of speech 
and again he will concentrate on those which can be recorded and measured or 
counted. He will still be left with an array of elements to study which must tax his 
powers and time far beyond capacity. In pursuing his task further, he will be 
dominated by the purpose of all scientific pursuit, the demand for constants related 
by laws to the elements representing the universe of his study. 


THE RATE OF SPEECH PRODUCTION 


The writer’s choice fell on what in the first place represented itself as the rate of 
speech production. This was selected from a group of 12 elements of speech, some 
linguistic (parts of speech), and some temporal (durations and frequencies of periods 
of talk and silence) (Goldman-Eisler, 1954). It was chosen because of its intriguing 
combination of constancy and variability. 

(1) The rate of speech production was the output of speech measured in syllables 
per unit time (minutes). This quantity takes no account of the fluctuation of output 
within the unit time ; it is an average giving us no information about the distribution 
of vocal sequences during that basic period. But the nature of speech production 
is such that a continuous flow of verbal output is only rarely achieved. In most 
cases utterances are series of verbal productions differing in length and broken up 
into discrete elements by pauses of varying duration. Speech rate calculated on the 
basis of duration of utterances was shown (Goldman-Eisler, 1956) to be largely 
determined by the duration and frequency of these pauses. The speed of talking 
measured as the number of syllables per second in the period spent in actual speech 
(rate of articulation) plays a significant part in the rate at which speech is produced 
over a period of time only in the rare cases when speech is completely fluent. The 
explanation for this lies in the different variabilities of pauses and speech activity. 
The variation coefficients for the proportional duration of pauses ranged between 
28-2% and 91-1%, and for the rate of articulation between 11-5% and 25-1%. The 
variability of the total speech rate is thus a function of the high degree of variability in 
the time which speakers spend hesitating between sequences of actual speech. All 
the same, the speech rate proved to be a consistent element in individual speech 
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behaviour and individual speakers were characterised by a certain preferred level. 

(2) At the same time, speech rate showed itself capable of being diverted from its 
habitual level. For example, one interviewer’s intolerance and inhibitive interference 
with the patient’s expressive needs (Goldman-Eisler, 1954a) drove his preferred level 
of output of 170-175 words per minute up to 232 words per minute (seven times 
the standard deviation expected for speech rates). 

(3) This shift outside the patient’s preferred rate of speech production proved 
symptomatic of a general re-arrangement in the grammatical and temporal pattern of 
his speech as well as of that of his interviewer. In particular it was accompanied by 
a reduction in the proportion of interviewing time which this speaker was allowed to 
occupy, Owing to continuous interruption by the interviewer. 

By increasing his speed of talking, however, the patient succeeded in compensating 
for loss of time, and this total output of words remained unchanged. By virtue of the 
elasticity of the speech rate the patient had been able to adapt to the 
change of the interviewing situation while maintaining his word output at a constant 
level. 

(4) When speech rates (number of syllables per minute) were calculated for each 
separate utterance exchanged in conversations or interviews, individual differences 
were again highly significant. Speech rate fluctuated for any one individual in the 
different situations at varying degrees, but within each individual’s range. 

(5) Apart from this it was found that the amount of freedom to vary speech rate 
is a function of the length of utterances. The range for short utterances varied from 
very slow (60 syllables per minute) to very fast (600 syllables per minute). As 
utterances grew longer, this range narrowed down, and a remarkable degree of stability 
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was gained at a length of about 100 syllables. As may be seen from Fig. 1 the 
standard deviation decreases by two thirds (from about 60 to 40 for utterances of 
100 syllables). At the same time, as utterances become longer, speech becomes 
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uniformly slower (Fig. 2) ; high average speeds are never maintained for utterances 
longer than 100 syllables. The range of speech rates for long utterances (100 syllables 
and more) varied from 140 - 320 syllables per minute. (This is an inclusive range for 
several speakers.) 


from 

t lengths 
se-line 
3 
1 


of over 100 syllables as ba 


——— /Normals/ 


---- /Patients/ 


s 
T 


Mean talking speeds 
average speeds of talking a 











Length of utterance (no. of syllables) 


Fig. 2. 


In the light of these facts the increase from an average of 170 to 232 words per 
minute in the speech of the patient mentioned before may have been the result of 
his utterances being shorter, as we should expect them to be since he was being 
continually interrupted by the doctor. It is likely that in the course of the interview 
he learned to anticipate this and to crowd as much talk as he could into a short time. 
The fact of the wide variation in the speed of utterances is thus to some extent reduced 
to a variation in the length of utterance and this has further implications. 

The two facts which have emerged in the investigations referred to, namely that 
(1) speech rate is a function of pauses in speech, and (2) that longer utterances are 
slower than short utterances, lead to the conclusion that not only the absolute length, 
but the proportion of pausing time in longer utterances is higher than in shorter. 

From this it would appear that there is a class difference between the long utterances 
and the short utterances exchanged in situations of interaction. 

An examination of the records of speech rates of partners in conversation suggesied 
that the difference in the conditions of interaction under which short interjections or 
lengthy statements are made is one between “ participation and sole possession of the 
field ”, and that “ the object of participation and possession in speech situations is time, 
the dimension in which linguistic behaviour progresses ” (Goldman-Eisler, 1954b). (“In 
a conversational situation, time is a quantity to be shared between partners... ”’.) 
The problem of distribution of time in conversation becomes critical when the demand 
for it exceeds the supply, i.e. when the interlocutors have the urge to speak, either 
simultaneously or when the speaker’s urge for talking is so disproportionate as to 
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prompt him to encroach on time which his partner is not prepared to forego. Short 
utterances being free to cover a very wide range of speeds will have a much greater 
capacity to reflect such encroachments upon a co-speaker’s time than long utterances. 
They constitute, to a large extent, interjections or interruptions. Long utterances, on 
the other hand, represent a different aspect of the conversational situation. They are 
the result of the speaker having taken the floor, and are dependent on sufficient time 
being available. They are conditional, too, on the passivity, whether voluntary or 
enforced, of the interlocutor, enabling the speaker to continue talking at some length. 
The speaker thus talks, so to speak, “ in his own time ”. 


GAPS IN THE STREAM OF SPEECH 


In connection with these deliberations the question presented itself as to whether 
there might be some kind of relation between the division of speech into short fast 
utterances and long slow statements, and the time-honoured dichotomy of inter- 
jectional and intellectual speech. It seemed that the answer to this question lay in 
a better understanding of the significance and function of the pauses which interrupt 
and thus slow down the flow of speech production. 

Two factors operate in speech making it a discrete rather than a continuous 
performance. (1) Biological necessity forces even the most fluent speaker to stop at 
more or less regular intervals in order to get air into the lungs. (2) Beside these 
physically unavoidable halts imposed on every speaker, fluency in speech is frequently 
blocked psychologically by hesitations. 

Breath pauses range between half a second and a second. Measurements of breath 
rates during speech (Goldman-Eisler 1955, 1956a) revealed for the most part a range 
between 2 and 20 respirations a minute. Thus breathing might normally occupy 
between 1-5 and 15 seconds in the minute, i.e. between 25% and 25% of the total 
speaking time. Measurements of periods of hesitation on the other hand showed 
these to occupy anything between 0% and 80% of the total speaking time of one 
person (the means for 8 speakers being 4-4, 19-3, 27-9, 29-8, 34-3, 43-6, 53-2 and 
47-6 per cent). 

Breath rate under normal rest conditions varies between 15 and 18 respirations per 
minute (Clausen 1951) and it is therefore obvious that speaking interferes to some 
extent with breathing and that the phase of expiration is frequently extended to the 
limits of the speaker’s capacity for holding breath. 

The degree to which this is done varies for different speakers and different utterances 
(Goldman-Eisler, 1955), and it may be interesting to consider in what way the speech 
rate may be related to the reduction of breath rate during speech. A fast and fluent 
speaker, for example, who is little held up by hesitation pauses, might be expected to 
put off the moment of inhaling until he comes to a stop dictated by the structure of 
language ; the hiatuses required by phrasing or emphasis in the service of communica- 
tion are the “correct” places for taking breath by elocution norms. Thus the 
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requirements of the art of oral delivery are that it be uninterrupted and continuous 
within phrases and that breathing should be delayed until it is grammatically 
convenient. This means that the frequency and placement of breath intake must to 
some extent be a function of the grammatical structure of sentences. This is certainly 
the case when we read prepared texts (Goldman-Eisler, 1958). In spontaneous speech 
where performance is activated from sources in several systems of the organism, 
each representing a different trend, it is more difficult to conform to norms and to the 
laws of language structure, and also to fulfil expressive requirements while impulses 
affecting the respiratory system assault the voluntary control of its action. 

Pre-verbalised and pre-structured meaning is often difficult to accommodate within 
the norms and constraints of conversational language and its fitting into these is one 
factor holding up the flow of speech. Affect finds an outlet in breathing behaviour 
(Goldman-Eisler, 1955) and the control of respiration to fit in with the grammatical 
structure may be beyond the powers of the speaker. The time factor calling for a 
mcdicum of fluency also competes for attention. Indeed, for a speaker in the act of 
organising speech to master symbolic as well as rhetorical expression equally weil 
requires a high degree of integration and cortical control. Such a high level of planning 
in the course of the spontaneous creation of verbal expression is rarely attained, though 
the corresponding level of performance is aimed at by all who are concerned with the 
effectiveness of their communications. In the performances of reading, oratory or 
acting this aim is to a large extent achieved, owing to the fact that attention can be 
entirely concentrated on presentation. There are, of course, a large number of 
rhetorical speech elements, such as intonation, stress, voice quality, accent, etc., but 
we are considering here only those most directly related to speech rate. 

If presentation must thus fall short of perfection in spontaneous speech, we must 
expect compromise to be attained at the cost of any of the systems involved in speech 
production in different and varying degrees. 

An objective and quantitative record of the kind of compromise attained throughout 
the sequence of speech, identifying the systems and measuring the degrees would 
constitute the scientific description of the expressive quality of the speech concerned. 
It would reflect in continuous sequence the transformations from one level to the 
other showing these in interaction. 

We shall consider below three parameters of verbal expression and their mutual 
relations ; fluency, breathing and information. The relation of these to the psycho- 
physiological systems (neuromuscular, respiratory and cortical) underlying them 
should point the way to further links connecting symbolic and physiological functions. 

Breathing and speech production are functionally related. The vital activity of 
breathing not only continues during the speech process, but the inhaled air supplies 
the energy for the production of articulate sound. The difference between breath 
pauses and hesitation pauses is therefore not only one of the greater variability and 
average duration of the latter, but also of functional relation to the production of 
speech. There is, for the occurrence of hesitation pauses, no such functionally obvious 
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reason as for that of breath pauses. Their incidence in the verbal sequence is often 
not only syntactically and semantically redundant, but interferes with the structural 
cohesion of syntactic grouping and thus with the intelligibility of the message 
Considering, however, that speech, besides being the production of articulate sound 
sequences, also involves their reference to meaning, i.e. symbolic processes and their 
organisation into complex structures, the assumption that hesitation pauses are 
functionally related to the organisation of speech seemed well worth while 
investigating. 

The two dimensions of speech, the motor and the symbolic, would thus be served 
by two different types of gap breaking the continuity of speech production. Their 
divided functions are matched by their different nature, breathing being an activity 
in the neuromuscular and autonomic systems (in European languages incompatible 
with vocal productions) and pausing an arrest of, at least peripheral, activity. 

Consistent with this division is the fact that increase of hesitation was found to be 
accompanied by reduced breathing (Goldman-Eisler, 1956b). The duality is thus 
not one of speech and non-speech, but one of activity (speech or breathing) and its 
delay (hesitation), and possibly of peripheral and central activity. The assumption of 
such an alternation would be in harmony with such evidence as has been established 
concerning the psychological significance of variations in the frequency of breathing 
{Breath holding during moments of attention and of delay when thinking, or when 
anticipating some fresh action on the one hand and hyperventilation as the 
accompaniment of emotional excitement are reported in experimental as well as 
clinical work (Clausen, 1951, Altschule, 1953, Fenichel, 1953, Golla, 1921, Kubie, 
1948, Schilling, 1929, Suter, 1912). The writer’s analysis of speech during psychiatric 
interviews confirmed these conclusions (Goldman-Eisler, 1956a). 

Restricted breathing, on the one hand, with increased pausing and speech content 
betraying restricted emotionality as in attention, inhibition, caution or intellectual 
effort and increased breathing activity, on the other hand, accompanying outgoing 
emotions such as jealousy, sex, aggression and wishes (at the opposite extreme) formed 
an excitation-inhibition syndrome measurable through the process of speech. 

The two types of gap in the speech process seemed therefore to represent the time- 
honoured division of speech into intellectual and emotional (Cassirer), voluntary and 
automatic (Hughlings Jackson), symbolic and interjectional, etc. This could be verified 
by collecting evidence that the speech related to each type of gap could fairly be 
described in one or other of these terms. 


THE FUNCTION OF HESITATION PAUSES 


As is apparent from the above, hesitation pauses are manifestations of the more 
general blocking of activity which occurs when organisms are confronted with 
situations of uncertainty. They may be assumed to be related to the organisation of 
speech in the sense that they occur when the selection of the next step requires an act 
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of choice. The next step to be selected may be the next word, or the next series of 
words, or the structure determining the sequence of the next series of words. 

Once formulated in terms of information theory, the hypothesis could be tested by 
relating the incidence of pauses in spontaneous utterances to the information content 
of the words constituting them. This was estimated experimentally. 

The experimental evidence (Goldman-Eisler, 1957, 1958) has shown hesitation 
pauses to anticipate a sudden increase of information (measured in terms of transition 
probability) ; indeed, the close relation found to exist between pauses and information 
on the one hand and fluency of speech and redundancy on the other, seems to indicate 
that the interpolation of hesitation pauses in speech is a necessary condition for such 
an increase. Delay is thus an important element in the production of information 
Fluent speech was shown to consist of habitual combinations of words such as were 
shared by the language community and such as had become more or less automatic. 
Where a sequence ceased to be a matter of common conditioning or learning, where 
a speaker’s choice was highly individual and unexpected, on the other hand, speech 
was hesitant. 

Fluency and hesitation in the production of speech consequently indicate its 
properties in other respects. Experimentally it was possible to discriminate them in 
terms of information (entropy) and redundancy. Functionally they coincide with 
Hughlings Jackson’s (1932) division of speech into automatic and voluntary, or “ old, 
well organised ” and “ new, now organising speech”. The delays in the production 
of speech might accordingly be recognised as the “now” periods of speech 
organisation. 

The question raised above concerning the relation between the division of speech 
into short and long utterances, and the dichotomy of interjectional and intellectual 
speech can now be answered in the affirmative. Time to pause seems to be a condition 
for the kind of central processes (thinking) which underly new organisation in speech 
to take place, whether it be the process of structuring, selection or symbol formation. 

The wide range of variation in speech rates and hence in hesitation pauses as 
between utterances, situations and speakers is therefore a reflection and a measure of 
the wide variation in the proportions in which speakers mix automatic, already 
organised verbal sequences with newly organised speech fitted to a specific meaning 
in a particular situation and adjusted to its special requirements. The immense store 
of learned combinations which is at the disposal of most normal members of a language 
community, though in degrees varying with literacy, education and verbal skill 
enables most people to adapt to the demands of intercommunicatory pressures 
These are mainly (1) time (the ball of conversation must change hands without too 
much hold-up) (2) intelligibility and (3) communication of subjective meaning as 
experienced by the speaker. The social demands of conversation, including for 
example interviews for employment, put a high premium on the fulfilment of the first 
two. The interlocutor is in a less favourable position to judge the third. Thus speakers 
bent on keeping up with the pressure of time and the need for being intelligible ard 
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maintaining rapport will be tempted and constrained to having recourse to ready 
made verbal sequences, phrases and clichés, and subjective meaning itself will be 
guided through these channels and modified as a result. 

It is in these respects that the psychiatric interview represents an exceptional form 
of interpersonal communication. Its aim is that the patient, having recalled traumatic 
experiences with reactivation of the original affect, should learn to view these 
experiences in the light of his adult judgment. It involves a re-structuring of the meaning 
attached to them, and communication of this new meaning in the process of being 
re-structured to the psychiatrist. It is only by this alternate process of verbal 
expression of affect and creative verbalisation of insight that psychotherapy can 
proceed. In this situation the psychotherapist is to a large extent passive (this is 
correct at least for orthodox psychoanalysis). Time pressure is no factor. The patient 
is sole possessor of the interview time. Intelligibility in logical and mundane terms is 
not demanded; the premium is on spontaneity, on faithful communication of 
subjective experience and its content, on verbalisation of insight gained. The 
creation of the setting of the psychiatric interview thus amounts to the creation of a 
setting which is favourable to a detachment from the linguistic constraints (through free 
association), and from the bondage of habitual verbalisation, at least in the spheres 
under discussion in psychotherapy ; it promotes their restructuring by means of a 
newly organised verbal expression, and the fitting of verbal expression to newly gained 
insights. 


THE SIGNIFICANCE OF SPEECH-BREATHING BEHAVIOUR 


The interpretation of the fluctuations of speech rate in the light of the significance 
of hesitation pauses alone can however not be complete. Breathing though less 
important as an impediment to the flow of speech has a profound influence on the 
‘nature of speech production. Two aspects of the breathing activity are relevant to 
‘speech production : (a) the rate of escape or expulsion of the outflowing current of 
air, or reciprocally the degree of ventilation of speech (i.e. the amount of air escaping 
through speech) and (b) the neuro-muscular activity controlling the action of the 
respiratory muscles and thus regulating the rate of escape of the expelled air current. 

Speech represents a wide range covering all degrees of air escape, ranging from 
the most economic use of the outflowing air current under strict muscular control to 
the quicker escape of air in a state of relaxation. The number of syllables uttered per 
breath was shown to be a valid measure of the rate of expulsion or escape of the 
outflowing air in speech (Stetson, 1928) and consequently a measure of the 
extent of the transformation of the energy of expiration into verbal activity. The 
production of speech being a cortical activity, the extent to which the inhaled air 
current is utilised for speech production is a measure of the control of the involuntarv 
action of respiratory reflexes by the cortical action of speech production. 

To obtain this information breath intake was recorded during spontaneous speech 
(Goldman-Eisler, 1955, see p. 54 for method of recording). With this additional datum 
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the rate of speech could be expressed as the product of breath rate and output of 
speech (syllables per breath). Graphically it may be represented as the hyperbula 
the two sides of which are the breath rate and expulsion rate (Fig. 3). 
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Fig. 3. The broken line indicates the speech rate (S.R.) rising by 50, 150-450 syllables per 
minute achieved at respiration rate up to 34 a minute and an output of up to 100 syllables 
per expiration. 

Fig. 3. 


From this it follows that any particular rate of speech can cover an infinite variety 
of combinations of breath rate and expulsion rate, and therefore very different 
physiological and psychological processes may operate in its production. Speech 
production can fluctuate along any and all of these parameters and constancy 
maintained in one of them involves variation in the other two. When degrees of 
constancy and variation were compared for all these quantities it became evident that 
speakers adapt to the requirements of situation, topic and interlocutor mainly through 
variations in the rate of speech output and that they preserve balance by keeping the 
economy of air outflow per speech unit (syllable) constant (Goldman-Eisler, 1955). 

Deviations from the level preferred in this respect must therefore be interpreted as 
a signal of a different, more vital significance in respect of the speaker’s equilibrium 
than the fluctuations of speech rate. For if we take the fluctuations of breath rate 
as indications of the ebb and flow of affect strength (Goldman-Eisler, 1955) a constant 
level of expulsion rate (syllable output per breath) indicates the fact that the affect is 
under control and being guided into verbal channels. A deviation from this preferred 
level is thus an indication that the balance between the forces of affect and control 
(or excitation and inhibition) has been upset. A reduction in the output of syllables 
per breath means that a larger proportion of the inhaled air current is “ puffed out” 
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or released unused as steam is puffed from an engine, i.e. that emotional excitation 
has not passed through the process of verbalisation. High breathing activity with low 
verbal output shows excitation to be unattached and unmarshalled by speech and 
all that speech activity involves, such as conceptualisation, the symbolic process, or 
connections of affect with cortical activity. What is measured is the delay of the 
relaxing phase of the respiratory cycle, although the actual length of the contraction- 
relaxation phase depends on the rate of respiration. The two activities of breathing 
and speaking are continuously interacting and the level of control cannot be judged 
on the basis of verbal transformation as measured by the expulsion rate only, but also 
by the intensity of affect as measured by breath rate. Different levels of respiratory 
activity set the organism engaged in speech different tasks. To maintain the same 
rate of speech production, e.g. 200 syllables a minute, at a low breath rate of, say 
four respirations a minute, an output of 50 syllables per breath is required which 
taxes the speaker’s capacity for gradual release action to quite an extent. To maintain 
the same speech rate at a breath rate of 20 respirations a minute, on the other hand, 
an output of only 10 syllables per breath is required which is not only easy to 
accomplish, requiring a lesser degree of voluntary control, but leaves the speaker 
with a surplus of expiratory air current to play with and use, so as to give his speech 
expression and rhetorical colour. This difference is evident from Fig. 3. When the 
speech rate is changed the rearrangement in the pattern of speech production may 
involve changes such as the following : assuming a speaker accelerates to produce 360 
syllables a minute, a breath rate of four a minute will then hardly be supportable, 
since it would require an output of 90 syllables per breath (though this is not 
impossible to achieve, see Goldman-Eisler, 1956a). Normally, however, with 
acceleration of speech the speaker will also accelerate breathing, particularly where 
its rate has been too low for the requirement of verbal production. A breath rate of 
20 a minute will allow a speaker 18 syllables per breath instead of 10 as before, thus 
leaving him less margin for expression and rhetoric. If he wants or is impelled to 
use speech in a more expressive way, he will have to slow down or breathe more 
often. In this is implied that speech rate and breath rate are independent dimensions 
of speech, though they may at times be related (Goldman-Eisler, 1956b). 


DISCUSSION AND CONCLUSION 

If, as was shown above, speech rate reflects the degree of hesitancy and therefore 
of organisation or automatism in speech, if breath rate indicates the affect strength 
(emotional excitation), and the output of speech per breath (expulsion rate) reflects 
the degree of its cortical control, it might be interesting to explore the significance of 
some possible combinations. 

For example, how can we interpret the mental state of speakers who accelerate 
speech as well as breathing activity? Fast fluent speech, as was shown, tends to be 
weighted with habitual, well organised sequences (automatic speech) ; their highly 
ventilated utterance indicates a state of emotional excitement. Recalling Hughlings 
Jackson’s (1932) classification of emotional speech as a special instance of automatic 
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speech, it seems that this particular form of speech production might correspond to 
Jackson’s emotional speech. It may be remembered that Jackson referred to speech 
clichés serving as vehicles for the discharge of emotion. 

A different state must be assumed to give rise to speech which is slow and highly 
ventilated. Being slow implies that it contains a good proportion of hesitation pauses, 
and this according to our results (Goldman-Eisler, 1957, 1958) signifies that symbolic 
and structuring processes are in progress during speech ; in this sense the act of 
speaking now contains a good proportion of symbolic or intellectual activity with 
less automatic activity than in the previous case. High breathing, on the other hand, 
indicates a state of emotional excitation. 

Deceleration might sometimes be due entirely to heavy breath intake. In this case 
verbal activity has given way to breathing activity, emotion seems to be breaking 
through uncontrolled, and speech activity disintegrates. This may be the case in 
short utterances. It is in the nature of things that the persistence with organised and 
structured verbal activity over a longer period involves a change of the speaker’s 
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state in the direction of regained cortical control. This may be in the form mentioned 
above of producing a fast flow of automatic speech. In this case the balance is 
restored through the activation of the neuro-muscular system and using affective 
energy in the exercise of skill. Or it may take the form of hesitant speech alternating 
with heavy breathing. Such speech then is produced by a process of alternating 
central activity (thinking) and affective discharge. In this case balance is gained 
through the activation of selective processes achieving a high level of verbal expression 
by fitting speech of specific choice to experience. 

Such is the condition of speech postulated by Breuer and Freud (1944), namely 
that verbal expression must be accompanied by discharge of affect for the therapeutic 
action of talking to become effective. In sequence the process as measured and 
observed by the writer is such that hesitation precedes significant verbalisation and 
heavy breathing (sigh of relief) follows it. This seems to be the sequence typical 
of effort behaviour (Golla, 1921). The psycho-physiological connection of the 
parameters of speech production (fluency, breathing and information) has been 
demonstrated by the writer (Goldman-Eisler, 1956c) : under certain conditions speech 
produced with hesitation as well as ventilation may be accompanied by a sudden 
decrease of muscular tension as measured by the electromyograph. The conditions 
were clearly definable in terms of the meaning structure of the content of the 
psychiatric interview : tension was relieved after ventilating traumatic and repressed 
material. The conditions under which such a configuration of trends (Fig. 4) in 
the interview might occur is, however, not yet understood. But the mediating role of 
verbal expression was illustrated by the fact that it brought about a re-structuring of 
psycho-physiological relationships in a most dramatic manner. Thus the relation 
between muscular tension and respiratory activity changed dramatically after the 
verbal communication of the traumatic material as may be seen from Fig. 5. This 
communication, bearing the character of catharsis as described by Freud was made 
at a reduced rate of speech and heightened breathing activity and led to a dramatic 
relief of tension. 

Data in the writer’s possession (unpublished) also show that the fluctuations of 
speech rate may be accompanied by a corresponding inverse fluctuation of muscular 
tension (the action potentials of the forearm muscles decreasing as speech rate 
increased and vice versa). This may reflect the relation between effort, in our case 
involved in the choice of words (hesitant speech), and muscular tension (Golla, 1921, 
Schilling, 1929, Suter, 1912), but the relation may also bear a more dynamic and 
causal character, the communication act being operative in reducing muscular tension. 
In both eventualities the conditions under which such a clear cut inverse relation 
prevails are not understood. 

The analysis of speech production has, however, resulted in the isolation of 
parameters of speech production which have proved to be highly relevant as links in 
the psycho-physiological chain. Their nature and function may be defined in the 
following terms : 
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The variation of speech rate represents the fluctuation of probabilities determining 
the choice of verbal sequences and is a reflection of the processes of selection which are 
involved in speech. 

The independence of this process from respiratory fluctuations shows that once the 
process of verbalisation has been entered into, excitation in the organism is compatible 
with symbolic or intellectual activity. This seems due to the serial nature of speech 
allowing alteration of selective, automatic and expressive actions. 

It is by virtue of this elasticity of the speech process that speech has become, for 
articulate humanity, the most important factor in the maintenance of balance and 
adjustment to the community. In the process of adjustment through verbal 
communication among individuals, primitive expression of emotion becomes displaced 
by voluntary actions and ideational contents. As a result articulate man has evolved 
a hierarchy of affect ranging from the free and untamed to the intellectualized and 
“patterned”. Instinctive emotion through its connection with linguistic expression 
becomes “felt, conventionalized emotion”. The dichotomy of excitation and 
inhibition owing to the fusion of emotional discharge with linguistic form, becomes 
differentiated and transformed into the continuum of affective experiences which has 
come to characterize all psychic activity. 

The isolation from the process of speech production of parameters of speech 
representing selection processes (hesitation pauses) affect strength (breath rate) and 
degree of its control (expulsion rate) and the understanding gained of their function 
has produced a set of powerful tools for the investigation of speech as a process 
reflecting the dynamics of this continuum. 
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BOOK NOTICE 


Manual of Phonetics edited by L. Kaiser. pp. xv + 460, North-Holland Publishing 
Company, Amsterdam, 1957. Price : 76s. ($10.). 


This volume appears under the aegis of a Permanent Council which, during the 
last twenty-five years, has been responsible for organizing a number of International 
Congresses of Phonetic Sciences. Dr. Louise Kaiser of the University of Amsterdam, 
who has edited the book, is Secretary of the Council and she indicates in the 
introduction to the Manual that the present publication is to be looked upon as a 
continuation of the Council’s former activities. The book differs from the proceedings 
of a congress in the sense that papers given at international meetings are largely 
addressed to a specialist audience whereas the Manual is intended to reach a wider 
public. The papers are contributed by a large number of authors and cover a 
considerable range of subjects and like most publications of this type the book 
suffers from the disability that contributors are not agreed upon the level of knowledge 
to be assumed in the reader. 


The Manual opens with a historical section that includes a brief history of 
phonetics (here understood almost exclusively in the sense of experimental phonetics) 
by G. Panconcelli-Calzia and a short review by F. Trendelenburg of more recent 
technical advances in methods of inspecting and measuring the events in speech. 
This is followed by a section entitled “ Basal Sciences of Phonetics” of which the 
first chapter, by G. E. Arnold, is on the morphology and physiology of the speech 
organs. It sets out to give the basic anatomy and physiology necessary for an 
understanding of respiration, phonation and articulation, very much in the form in 
which the material would be presented to the student. This chapter is not intended 
to provide much more than is to be found in standard text-books, but, in common with 
all other chapters in the Manual, it gives an extensive and valuable bibliography. 
N. R. French, in the next chapter, deals with auditory considerations in connection 
with speech. In addition to giving the essential facts about hearing, he also discusses 
the results of a number of psycho-physical experiments and their implications for the 
study of speech communication. The chapter on “ Phonetics from the viewpoint of 
psychology ” by A. Gemelli and J. W. Black presents mainly statistical information 
gained from studies of the intelligibility of speech at various levels. Readers should 
note that a major error occurs in Table III of this chapter, in the list of frequencies 
for British English, and they would be well advised to consult the original paper 
referred to. Statistical methods in phonetics are treated by E. B. Newman who gives 
the basic statistical operations that are applicable to the study of speech and language 
and also gives a very brief exposé of Zipf’s and Shannon’s contributions to the subject. 


The next major section is entitled “‘ Phonetic Research”. The first four chapters 
are contributed by G. E. Peterson who deals mainly with research methods available 
for the study of respiration, phonation and articulation in speech and with the results 
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of applying some of these methods. In the chapter on “ Breath stream dynamics ”, 
there is a fairly extended treatment of the work of Stetson and his collaborators. A 
chapter on X-ray photography is contributed by J. D. Subtelny, S. Pruzansky and 
J. Subtelny who give a brief history of X-ray studies of speech and discuss the more 
recently developed methods including X-ray cinematography and the taking of 
sectional static pictures. The final chapters in this section, on speech analysis and 
synthesis, are contributed by H. L. Barney and H. K. Dunn. Several techniques for 
the acoustic analysis of speech are described, including the spectrograph and related 
methods (e.g. the correlatograph) and various types of mechanical speech recognizer. 
The chapter on analysis also gives a summary of some of the data on vowel formants, 
of the effects of distortion on articulation and of the statistical aspect of speech 
analysis. Methods of speech synthesis are dealt with very briefly and there is some 
reference to analysis-synthesis systems. 


The following section on “ Phonetics in its Relation to Linguistics ” begins with two 
general chapters, the first on “ Phonology in relation to phonetics ”, by R. Jakobson 
and M. Halle. This forms a link between the first part of the Manual, which deals 
with the acoustic and physiological description of the sounds of speech, and the 
second part, which is concerned with the linguistic function of these sounds. The 
discussion in this chapter is based on the notion of distinctive features but it reviews 
briefly other theories of the relations between sounds and phonemes. This is followed 
by an extended treatment of the framework of classification within which distinctive 
features may be considered and concludes with a discussion of various aspects of 
phoneme patterning. In the chapter on “Phonetics and linguistic evolution”, 
A. Martinet discusses from the point of view of a “ realistic structuralist ” the relation 
of phonetics to the historical study of language and the importance of viewing 
linguistic history as the evolution of phonological systems—an evolution with its own 
appropriate time scale. 


The remainder of the section on linguistics consists of brief chapters for the most 
part giving a descriptive account of the main phonetic features of a particular group of 
languages. 

The section on “ Phonetics in its Relation to other Sciences ” begins with a chapter 
on speech pathology by R. Luchzinger in which he describes a number of the 
experimental methods available for the study of speech disorders and gives data from 
various acoustic, physiological and neurological investigations in cases of cleft-palate, 
oesophageal speech and stuttering. Other chapters in this section deal with the 
relation of phonetics to sociology, by A. Sommerfelt, to musicology, by J. Smits van 
Waesberghe, and to aesthetics, by A. W. de Groot. 


The last section of the Manual, entitled “Phonetics and the Origin of Speech”, 
contains a chapter on the development of speech in the child by O. C. Irwin. The 
author presents a good deal of statistical material on the sounds made by children 
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from birth to the age of 24 years, statistics from which the pattern of normal develop- 
ment can be derived and which show the effect upon this development of various 
factors, including disabilities such as feeble-mindedness and cerebral palsy. Two 
chapters of a more general nature conclude the Manual : one on “ The development 
of speech in mankind”, by E. Buyssens and the other on “ General semantics ”, by 


F, Trojan. 
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THE EFFECT OF LISTENERS’ ANTICIPATIONS ON THE 
INTELLIGIBILITY OF HEARD SPEECH 


D. J. Bruce 
(University of Reading) 


The experiment described demonstrates the effect of prior set on the intelligibility of 
heard speech. Twenty subjects hear the same twelve word sentences in the presence of 
noise on five occasions The sentences fall into two groups — one unprefaced by any 
indication of topic, the other prefaced on each occasion by a different word, which 
subjects are told is the topic to which the group refers. Only one of the sentences in 
the latter group is fully appropriate to each word given. Wherever this conjunction 
occurs in the order of testing, the sentence involved reaches its highest level of 
intelligibility, When inappropriately prefaced, sentences are misinterpreted to a 
considerable extent or resist interpretation. 


» 


“ Speech is no more than a series of rough hints which the hearer must interpret . 
—L. R. Palmer. 


INTRODUCTION 


The concept of “set” is a familiar one in present day experimental psychology. 
Though the term is open to a number of interpretations, the general implications 
of its use are that prior to an act of perception the percipient is in a state of prepared- 
ness to receive a particular class of stimulation and to operate on that stimulation 
in certain pre-defined ways. This preparedness is often described by reference to 
the percipient’s “ attitude formed ” or “ expectations established ”, or, if the opera- 
tional aspect is to be stressed, by saying that he is ready to deal with the forthcoming 
stimulation via a particular psychological process— “set to learn”, “set to 
memorise ”, “ set to repeat ”, and so on. 

The importance of anticipatory state for what the percipient will, in fact, perceive 
has been demonstrated in many experiments involving the visual modality (Zangwill, 
1937, Luchins, 1945, Hall, 1950). In a very real sense, what a person sees is a 
function of what he expects to see. Everyday observation suggests that the deter- 
mining influence of prior set is also strong in the auditory modality, particularly in 
regard to speech. 

When analysing the contextual constraints which obtain in our language, it is 
convenient to distinguish three classes :— 

(a) Syntactical constraints — those contributions to redundancy arising from the 
accepted structure of the language. 

(b) Verbal association — the determining influences coming from established verbal 
habits. 
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(c) Situational context — determining influences, often non-verbal, which, in the 
main, set the listener to expect a certain class of message before transmission has 
begun. Examples would be facial expression, particular locale in which speech 
transaction is to take place, preliminary instructions, etc. 


Effects on speech intelligibility arising from the presence of constraints of type 
(a) and (b) have been described in some detail (Miller, 1951, Bruce, 1956), but the 
work done on constraints of type (c) has been small. Peters (1956) has shown that, 
when test messages are prefaced by varying lengths of exposure to highly similar 
training messages, with increased exposure to training lists and decreased signal/noise 
ratio for the test lists a significantly greater number of training list words were 
selected by listeners in response to test messages. Under these conditions, exposure 
to training messages did not seriously affect the number of correct responses to test 
messages, but when an error response to the latter did occur it was likely to be a 
response correct for the training messages. And Fry (1955) devised 
a persuasive demonstration of the effect of set upon speech intelligibility. Listeners 
heard a recording of two men conversing, but recognition of words was made 
difficult through artificial distortion. After the first unsuccessful hearing, listeners 
were told that the speakers were discussing the topic of “buying a new suit”. 
Most of them were then able to follow the whole conversation on the ensuing playback. 


However, such examples are sparse in the experimental literature. Clinically, 
though, the aid to intelligibility constituted by pre-setting is well recognised. In 
American speech reading classes, for example, it is the practice to give lessons in 
which a single, previously announced, topic is discussed. As in Fry’s demonstration, 
the class may be referred at the outset to a theme, such as “ shopping”, “ travel”, 
or “dining out”, and it is found that the central unity of the topic facilitates inter- 
pretation and conversational fluency (Pauls, 1956). The present investigation, which 
is one of a series concerned with the influence of various forms of contextual 
constraint on speech perception, seeks to give a systematic demonstration of the 
effect of listeners’ anticipations on the intelligibility of speech. 


OUTLINE 


Although such anticipations are frequently established in everyday life through 
non-verbal influences, the situational context employed to pre-set subjects in this 
experiment was, for convenience, provided verbally. The crux of the experiment lay 
in the fact that listeners, unwittingly; heard the same five sentences on five different 
occasions, each time preceded by a different word. These “ keywords ” were presented 
to the subjects as the particular sphere of experience or activity to which the ensuing 
sentences referred. In reality, only one of the sentences was fully appropriate to the 
given keyword. Thus, in each of the five tests a listener heard :— 
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(A) I tell you that our team will win the cup next year. 

(B) We then had some bread and cheese to round off the meal. 
(C) You said it would rain but the sun has come out now. 

(D) To do the same trip by rail costs more and takes hours. 
(E) The last few days I have been sick with a bad cold. 


in random order, all prefaced by one of the keywords: “Sport”, “‘ Food”, 
“ Weather ”, “ Travel”, or “Health”. Over the five tests the order of keywords 
was also randomised for different subjects. It will be noted that the keyword topics 
are quite general, and the sentences themselves are not highly specialised. Each 
sentence starts with one of the most frequently occurring words in the language, 
and contains a high proportion of non-specific words. Suitably spaced within this 
framework are specific words having a more restricted context. 

In addition to the keyword material, subjects heard twenty unprefaced sentences 
which were employed for the determination of a working noise level, and a further 
five unprefaced sentences which were used as a control for the keyword material. 
The latter will be referred to as “ cueless sentences ”. 


MATERIAL 


All the sentences employed in the experiment consisted of twelve monosyllables, 
and care was taken to match them on the basis of relative frequency of occurrence 
of constituent words. Selection of words was based on the Post Office Engineering 
Department’s research report “ Analysis of the Sounds in a Sample of Telephone 
Conversational Speech ” (Swaffield, 1953). An attempt was also made to match the 
sentences in terms of their structure and the relative phonetic difficulty of constituent 
words. 


SUBJECTS 


Twenty subjects took part in the tests. They shared a high standard of literacy, 
being drawn from the academic staff of Reading University. Their ages ranged from 
23 to 44, but the pure tone audiograms for all subjects fell within the conventional 
limits of normal hearing (+ 10 db.). 


SET-UP 


Listeners heard the sentences in the presence of noise. The latter was produced 
by means of a thyratron white-noise generator, kindly made available by 
Mr. R. Gregory of the Psychological Laboratory, University of Cambridge. Noise 
and speech were mixed to both ears over phones. Transmission was live (the 
experimenter sitting in a separate room), and amplitude clipped to give a uniform 
signal level. Sentences were read at a constant rate, each taking about 4-5 seconds. 
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Experimenter and listener could communicate with one another over one line ; 
another line led to a Ferrograph tape-recorder on which the whole test was recorded. 

Residual noise was maintained in the listener’s phones throughout the test, and 
increased to the determined level by switching a remotely controlled relay when 
test material was read. 


INSTRUCTIONS 


Subjects were told that a test fell into three sections :— 
(1) The determination of a working noise level. 
(2) Hearing five sentences at that level, about which no preliminary information 
was to be given (cueless sentences). 
(3) Hearing five sentences at that level, referring to a common aspect of activity or 
experience which would be named before the sentences were read (keyword sentences). 
For the three sections, subjects’ task was the same. Namely, to repeat into the 
recording microphone as much as they could of each sentence, as soon as the 
experimenter had finished reading it. They were to guess if uncertain. In addition, 
subjects were asked to express their degree of certainty with respect to the cueless and 
keyword sentence repetitions. 


PROCEDURE 


On each test, subjects were taken first through the noise level determination 
sentences. Their attempted repetitions were checked for accuracy by the experimenter 
as they were given. On this basis of demonstrated intelligibility, the experimenter 
instructed subjects to raise or lower the noise level by adjusting the noise generator’s 
gain control. The working level required (indicated by previous pilot experiments) 
was 25% intelligibility, and when this had been demonstrated on four consecutive 
sentences, subjects were asked to leave the generator set at its corresponding reading 
for the rest of the session. Thus each subject, on each test, heard the keyword and 
cueless sentences at the same effective noise level — equivalent to 25% intelligibility 
on the noise level determination sentences. Actual S/N ratios ranged from —11 db. 
to —17 db. on first tests, values decreasing slightly on later tests. 

With the working noise levei established, subjects passed on to either cueless or 
keyword sentences, according to the test’s place in the experimental design. The 
nature of the material had been described to subjects at the beginning of the test 
as under “Instructions”, and the second and third parts of the test were simply 
introduced by: “ Now we come on to the keyword material. The sentences this 
morning refer to —”, or “ We come now to the cueless sentences, no information 
about these”, according to which material was involved. Subjects expressed their 
degree of certainty along with their attempted repetitions. 

For a single subject, tests were separated from one another by an interval of 
about a week. 
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HYPOTHESES 


Expectations for the experimental outcome were :— 
(1) That a sentence would have greatest intelligibility when prefaced by the keyword 
most appropriate to it, regardless of the position in the serial order of tests at which 
this conjunction occurred. 
(2) That the effects on intelligibility of a less appropriate keyword would be shown 
in total failures to repeat a sentence, or parts of it, and in distortions of an attempted 
repetition in line with the keyword given. 


RATIONALE 


The randomisation of sentence and keyword order should allow an unbiased 
demonstration of any effect of pre-setting in subjects’ performance on the keyword 
material, but an additional check is provided by the cueleSs material. This consisted 
of sentences with a similar quality of reference function, placing of specific and 
non-specific words, etc. as the keyword sentences, but no prior information was given 
about the topics concerned. Comparison of results on cueless and keyword material 
should produce a clear picture of the effect of pre-setting. 


RESULTS 


For the purposes of the discussion which follows, keywords other than the one 
which was fully appropriate to a particular sentence will be described as 
“ inappropriate ”. 


1. The effect of repeated testing. 

As the experimental design required subjects to hear the same material five times, 
some cumulative effect of practice on scores might be anticipated, although the 
subjects’ lack of knowledge of results, and the slight decrease in S/N ratio as testing 
went on, suggest that the effect would be small. Taking total words correct for all 
five sentences in order of tests, the results were :-— 


KEyworp SENTENCES 


Test 1 2 3 4 5 
% correct 33-2 29-5 27:9 26-2 28-9 


CuELEss SENTENCES 


Test 1 2 3 4 5 
% correct 18-4 17-25 21-7 15.9 19-75 
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Such a practice effect is seen to be negligible, and, bearing in mind the random- 
isation of keyword and sentence order, may be omitted from further consideration 
of the experiment’s results. 


2. The effect of appropriate and inappropriate pre-setting on intelligibility of the 
whole sentence. 

Figs. 1 to 5 illustrate the number of correct repetitions by the subject group of 
each word in each of the five sentences according to the keyword which prefaced them. 
Fig. 5 has a percentage scale as its ordinate for reasons to be explained later. 

It will be seen that in general the hypothesis that a sentence would be most 
intelligible when prefaced by its most appropriate keyword is confirmed. The 
percentages of words correct for the group according to keyword were as follows :— 


KEYWORD 
Sentence Sport Food Weather Travel Health 
A 61-25* 40-0 36-7 37-1 41-7 
B 23-3 34.2* 14.2 18-75 21-25 
C 10-0 10-0 38-3* 21-7 5-0 
D 3-3 4-2 5-0 15-8* 4-2 
E 48-75 57-0 50-0 57-1 71-25* 


* Indicates the conjunction of a sentence with its most appropriate keyword. 


In spite of attempted matching, there is considerable variation between the five 
sentences in the overall level of intelligibility. However, this is not damaging to 
the intentions of the experiment. What is essential for the confirmation of the 
hypothesis is that scores for a given sentence under inappropriate keyword conditions 
should be around the same (lower) level. The results tabulated above provide this 
confirmation with the exception of one case — Sentence C under “Travel”. In 
itself this exception helps to prove the rule as will be shown in the more detailed 
consideration to follow. The outcome of a Friedmann non-parametric two way 
analysis of variance, taking subjects’ scores by keyword for each sentence, was to 
reject the null hypothesis at the following levels :— 


Sentence Probability Level 
A < -05 
B < -10 
C < 001 
D < 01 
E < 30 
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I TELL YOU THAT OUR TEAM WILL WIN THE CUP NEXT YEAR 


SENTENCE A. 


Fig. 1. 


Sentences B and E fail to reach the conventional level of significance, and the case 
of Sentence E will be separately discussed. 


The effect of the keywords on subjects’ anticipations and hence on the intelligibility 
of the ensuing sentences is emphasized by a similar analysis of scores on the cueless 
sentences. Here, it will be remembered, the material was not prefaced by indication 
of topic. The cueless sentences were :— 


The mess is out of bounds to white troops for a week. 

To wear a well cut suit is the sign of good taste. 

We will give a prize to the top boy of each class. 

You could grow some flowers in that bed next to the lawn. 
I think you should get more trade with the three new shops. 


ean 


For comparison with performance on the keyword material, scores on the cueless 
sentences were arranged in the same test order sequence. Thus scores on the cueless 
sentences when the condition “ Sport ” was obtaining for the keyword material were 
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WE THEN HAD SOME BREAD AND CHEESE TO ROUNDOFF THE MEAL 


SENTENCE 8. 


Fig. 2. 


grouped together, and similarly for the four other keyword conditions. The percentages 
of words correct for the group under this arrangement were :— 


PosITION IN TESTING 


Sentence With Sport With Food With Weather With Travel With Health 
a 22-1 33-3 219 30-0 : 21-7 
b 15-8 11-25 14.9 8-75 13-3 
c 22-9 19-6 16-2 23-3 30-4 
d 14-6 11-25 10-5 9-6 10-4 
e 21-25 20-8 19.3 21-25 20-0 


The hypothesis of differential intelligibility according to pre-Setting demands that 
there should be no significant change in the intelligibility of a cueless sentence on 
different test occasions. A Friedmann two way analysis of variance provides the 
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YOU SAID IT WOULD.RAIN BUT THE SUN HAS COME OUT NOW 
SENTENCE C. 
Fig. 3. 


following levels for the rejection of the null hypothesis, taking subjects’ scores on 
each cueless sentence according to the grouping described above :— 


Sentence Probability Level 
+ > -30 
b > -20 
Cc > -50 
d > 95 
e > 99 


It can be seen that the demand is met. 


3. The effect of appropriate and inappropriate pre-setting on intelligibility of the 
specific words. 
As described earlier, each sentence in both keyword and cueless material contained 
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TO 00 THE SAME TRIP BY RAR COSTS MORE AND TAKES HOURS 


_ SENTENCE OD. f 


Fig. 4. 


a proportion of words which played a more specific role in the sentence’s reference 
function. These were :— 


Keyword Sentences Cueless Sentences 
A. Team, Win, Cup a. Mess, Bounds, Troops 
B. Bread, Cheese, Meal b. Wear, Suit, Taste 
C. Rain, Sun c. Prize, Boy, Class 
D. Trip, Rail d. Grow, Flowers, Lawn 
E. Sick, Cold e. Trade, Shops 


Not unexpectedly, the effect of appropriate and inappropriate pre-setting on the 
intelligibility of these specific words was more intense than on that of the sentence 
as a whole. Percentage correct scores for the subject group on the specific, words in 
each sentence under the different keyword conditions were :— 
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THE LAST FEW DAYS | HAVE BEEN SICK WITH A BAD COLD 
SENTENCE E. 
Fig. 5. 
KEYWORD 
Sentence Sport Food Weather Travel Health 
A 51-7* 8-3 10-0 8-3 18-3 
B 11-7 41-7* 8-3 11-7 11-7 
Cc 75 75 2-5* 22-5 2-5 
D 0-0 0-0 0-0 12-5* 0-0 
E 32-5 34-2 40-0 40-0 70-0* 


* Indicates, as before, the conjunction of a sentence with its most appropriate Keyword. 


As with the scores for whole sentences, there is variability in scoring level between 
sentences, but intelligibility within sentences under inappropriate conditions remains 
relatively constant, with the exception of Sentence C under “ Travel ”. 
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A control check with the specific words of the cueless sentences similarly supports 
the whole sentence findings. Once again, percentage scores are tabulated to 
correspond with the keyword results in chronological order of testing :— 


POSITION IN TESTING 


Sentence With Sport With Food With Weather With Travel With Health 
a 16-7 21-7 15-8 20-0 10-0 
b 8-3 10-0 7-0 8-3 6-7 
c 13-3 16-7 8-8 15-0 20-0 
d 10-0 12-5 9.2 11-3 13-8 ° 
e : 0-0 0-0 2-6 2-5 2-5 


Significant differences within sentences are not apparent. 


4. The part\played by the specific words in intelligibility of the whole sentence. 

Reference to Figs. 1 to 5 will show how influential the intelligibility of a specific 
word was for the intelligibility of the whole sentence. For example, in Fig. 1 
intelligibility for the first part of the sentence is approximately the same irrespective 
of pre-setting conditions. But the rest of the sentence following the specific word 
TEAM has a consistently higher level of intelligibility when prefaced by the 
appropriate keyword “ Sport” than under any of the inappropriate conditions. With 
the correct perception of this specific word, the probability of successful interpretation 
of the remainder of the sentence was increased. Sentence A is, perhaps, the most 
striking demonstration of this effect, but the feature appears to a greater or lesser 
extent in each sentence. Performance on portions of the sentence subsequent to 
BREAD (Sentence B), RAIN (Sentence C), RAIL (Sentence D), and SICK (Sentence 
E) supports the conclusion that the. specific words played a major role in the 
intelligibility of the whole sentence. 


It is not suggested that only those people who heard a specific word correctly 
could make a correct interpretation of the words which followed it. In Fig. 1, for 
instance, with the keyword “ Sport ”, an additional two subjects have heard the word 
YEAR correctly to those who have heard TEAM. And under inappropriate keyword 
conditions the group’s signal lack of success on a specific word did not preclude a 
greater degree of correct interpretation of subsequent words. Certain words were 
obviously easier to hear than others. It is with the relative level of intelligibility 
under appropriate and inappropriate keyword conditions that we are concerned. 
Viewed in this light, the impression is gained that appropriate pre-setting not only 
increased the chances of specific words being intelligible, but, through their 
intelligibility, rendered the correct hearing of non-specific words more likely. If this 
is the case, the experiment provides a useful demonstration of associative habits. 
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For example, the portion of Sentence A “...our team will win the cup.. ” was 
heard correctly by the following number of subjects under the two conditions of 
appropriate and inappropriate pre-setting :— 


OUR TEAM WILL WIN THE CUP 
Appropriate keyword : 11 12 10 10 11 9 
Average for inappropriate keywords : 6 2 a 3 o 2 
(Values taken to the nearest whole number) 


The non-specific words WILL and THE are heard correctly by over twice as many 
subjects when appropriately prefaced as when inappropriate conditions prevail. 
Something approaching the same factor obtains for the non-specific word OUR, which 
preceded the first specific word in this sentence. The associative facilitation in this 
case spread backwards as well as forwards. The feature is seen again in Sentence D, 
with the phrase “ by rail...”. Correct scores on RAIL are few because a number 
of subjects recorded ROAD or AIR as their interpretation of this test word, but, 


bearing this fact in mind, the backward facilitation is clear :— 


BY RAIL 
Appropriate keyword : 8 a 
Average for inappropriate keywords : 0 0 


A further example of associative facilitation is given below, from Sentence C :— 


RAIN BUT THE SUN HAS COME OUT NOW 
Appropriate keyword : ll 5 14 14 7 13. 15 8 
Average for inappropriate keywords: 1 2 6 - @ 3 5 3 


The effect of appropriate pre-setting on the intelligibility of non-specific words is 
even more apparent here than in the case from Sentence A. Such results suggest that 
the increase in intelligibility following correct interpretation of a specific word applicd 
to whole phrases rather than to isolated words. 

That this may be so is supported by the exceptional case of Sentence C under 
“Travel”, mentioned before. Few correct interpretations were recorded for test 
words before SUN, but the latter was heard by seven subjects. Thereafter, intelligi- 
bility for the sentence was greater than under the other inappropriate keyword 
conditions :— 


THE SUN HAS COME OUT NOW 
TRAVEL : 10 7 5 7 9 4 
Average for other inappropriate keywords : 4 2 1 2 3 2 


Of course, the occurrence in this phrase of the words COME and OUT may well 
have increased its intelligibility when prefaced by the keyword “ Travel”, but the 
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improvement starts with SUN, suggesting that this specific word stood a better chance 
of being heard when prefaced by “Travel” than under the other inappropriate 
keyword conditions. Incidentally, another case of backward facilitation is provided 
by this example in the word THE. 

The unity of phrases in perception was also evident from the results on Sentence 
E, though in this case intelligibility was not contingent on hearing a specific word. 
During testing, E was the only sentence to be heard frequently out of context, and the 
reason for its greater intelligibility eluded the experimenter for some time. It was 
finally attributed to the fact that in delivery this sentence was being split into three 
discrete and equal phrase portions, thus : 


The last few days—I have been sick—with a bad cold. 

Such a delivery was prevented by the structure of the other four keyword sentences. 
For the remainder of the experiment, a different version of Sentence E was employed, 
involving a change in two of the original words and an alteration in structure. 
This was : 


The last day or two I have been sick with a cold. 

The drop in intelligibility was dramatic (from 74% to 32% words correct for all 
tests), but unfortunately the new version was introduced too late in the experiment to 
supplant the old in discussion of results. Accordingly, the entries for Sentence E in 
tables given above refer to both versions, but the percentage ordinate in Fig. 5 
represents intelligibility for the original version of the sentence only. 


5. Subjects’ confidence in their interpretations. 

It will be remembered that subjects were instructed to guess if they were not sure. 
This prompts the question: “Do the conditions of appropriate and inappropriate 
pre-setting influence what subjects think they hear, or merely affect their guessing 
habits ?”. Although both influences are germane to everyday speech transaction, 
it is interesting to establish how certain subjects were of their incorrect interpretations 
because, in a real-life situation such as receiving instructions over a noisy line, it 
might be the case that a similar misapprehension is more likely to be acted on if the 
listener thinks he has heard it, than if he simply regards it as a reasonable guess— 
though this is hypothetical. 

The analysis of errors showed that a substantial proportion of subjects’ incorrect 
interpretations was reported as heard as opposed to guessed. For example, erroneous 
words in the attempted repetitions of Sentence A under inappropriate keyword 
conditions totalled 110. Of these, 43% were described as heard. This percentage 
figure includes both non-specific errors and errors specific to the misleading keyword 
(as does the 57% reported as guessed). Certainty of hearing was by no means 
consistent with correct interpretation. In a number of cases, correct repetitions were 
described as guessed, while versions of the test material such as those in italics 
below could be accompanied by a report of considerable confidence : 
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Test Sentence : I tell you that our team will win the cup next year. 
$.18 under Travel : J tell you that I too will leave next year. 
$.13 under Food : I tell you that I feel more hungry than you are. 


6. Perceptual distortions produced by inappropriate pre-setting. 

The examples given above lead the discussion to what experimental support was 
forthcoming for the hypothesis that attempted repetitions would be distorted in line 
with an inappropriate keyword. The results provided strong confirmation on this 
point, but are difficult to summarise adequately. Perhaps their inherent appeal to 
humour justifies a literal report of representative cases. 


(a) Example of distortions of a specific test word. 

The word TEAM in Sentence A, prefaced by the keyword “ Food ”, was repeated 
variously as TEA (2 subjects), FOOD (2 subjects), CHEESE (1 subject), FEEL 
(2 subjects), and TOO (2 subjects)—the last version being followed respectively by 
“ will leave ”, and “ would like ”. 


(b) Example of a subject’s repetitions of one sentence under the five keyword 
conditions. 

Test Sentence: [I tell you that our team will win the cup next year. 

Under Weather: I tell. you that I see the wind in the south next year. 





Under Travel : next year. 

Under Sport : I tell you that our team will win the cup next year. 

Under Food : I tell you that our tea will be something to do with beer 

Under Health: I tell you that our team has been free from injury all this year. 


Keywords and repetitions are given in order of testing. 


(c) Examples of phrase distortion under inappropriate keyword conditions. 
Test Phrase (B): We then had some bread and cheese ..... 


S.3 under Sport : We went round the first tee... .. 
S.11 under Sport : We have to play Reading C..... 
S.12 under Weather: The end of the rainy season..... 
S.10 under Travel: |The main thing is to gothere..... 
S.6 under Health : We now have a canteen..... 

Test Phrase (C) : You said it would rain..... 

S.15 under Sport : The heat of the race..... 

S.1 under Food : We shall eat the rolls..... 

S.2 under Food : These sausage rolls..... 

S.4 under Travel : We visited Rome..... 


S.6 under Health : His children rave..... 
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Test Phrase (E): . ... Sick with a bad cold. 
S.12 under Sport: ..... wicket keeper forward. 
S.8 under Food : ..... eating at home. 

S.2 under Weather: ..... rather cold. 

S.1 under Travel : . ...up to the North Pole. 
S.8 under Travel: ..... by road. 


48 of the 60 incorrect words given above were accompanied by clear indications of 
whether subjects thought they heard them or were conscious of guessing. Reports of 
hearing accounted for 73% of this total. 


(d) Examples of sentence repetition given in the test immediately following the one 
that involved an appropriate keyword. 


The original hypothesis stated that a sentence would be most intelligible when 
prefaced by its most appropriate keyword, irrespective of the place of this conjunction 
in the order of tests. A corollary of the hypothesis is that, in the test(s) following a 
substantially correct repetition under appropriate keyword conditions, intelligibility 
should again return to a poor level with subsequent inappropriate keywords. The 
final examples to be given demonstrate that this was the case, and the extent of the 
deterioration shown emphasizes the major role anticipation plays in the perception of 
speech. 


Examples. are arranged in the sequence :— Test sentence ; subject’s repetition 
under appropriate keyword conditions ; subject’s repetition under inappropriate 
keyword conditions on the next test. Figures in parentheses indicate test order. 








Sentence A: I tell you that our team will win the cup next year. 
S.19 under Sport (4) I hope that our team will win the cup this year. 
S.19 under Weather (5) I hope that our will be a fine spell of weather. 
S.3 under Sport (2) I tell you that our team will win the cup this year. 
S.3 under Travel (3) I’m telling you that I have seen New Zealand. 
Sentence B: We then had some bread and cheese to round off the meal. 
S.4 under Food (2) We then had some bread and cheese to round off the meal. 
S.4 under Weather (3) @ young mausoleum. 

S.18 under Food (1) A of bread and cheese will round off the meal. 
$.18 under Sport (2) We now have some to round off the news. 
Sentence C: You said it would rain but the sun has come out now. 
S.16 under Weather (1) These days after rain the sun will come out now. 
$.16 under Health (2) drug. 
S.12 under Weather (4) rain and the sun has come out now. 





$.12 under Travel (5) spread out round. 
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Sentence D: To do the same trip by rail costs more and takes hours. 
§.2 under Travel (1) We do the same trip by rail what’s more its fast. 
S.2 under Food (2) by law what’s more 

S.1 under Travel (2) going by rail is much slower than by air. 
S.1 under Sport (3) The of football 

Sentence E: The last few days I have been sick with a bad cold. 


S.12 under Health (1) The last few days I have been sick with a bad cold. 
S.12 under Food (2) The last two rolls are 

S.10 under Health (4) The last two days I have been at home with a bad cold. 
§.10 under Sport (5) The last three goals 


7. Failures to repeat. 

Lastly, we must consider the hypothesis that the effect of an inappropriate keyword 
would also be shown in total failures to repeat a sentence, or parts of it. 

Failures to repeat anything of a sentence, even by guessing, had the following 
distribution : 

Total no. of sentences Total no. of failures 

Appropriate keyword : 100 a 
Inappropriate keyword : 399 76 
giving a percentage failure of 4% for appropriate keyword conditions, and 19% for 
inappropriate keyword conditions. 

Failures to repeat both whole sentences and parts of sentences, in terms of the 
number of test words involved, were distributed as follows :— 


Total no. of test words Total no. of failures 


Appropriate keyword : 1200 468 
Inappropriate keyword : 4788 2713 

giving a percentage failure of 39% for appropriate keyword conditions, and 57% 
for inappropriate keyword conditions. Entries in the tables above for inappropriate 
conditions are reduced by one sentence as the subject’s repetition was inadvertently 
not recorded. 

The percentage failure to repeat test words under appropriate keyword conditions 
was itself high, but under inappropriate conditions well over half the test material 
was sufficiently unintelligible to resist even guessing. A x° test based on the number 
of test words repeated correctly or incorrectly, and the number for which no repetition 
was forthcoming under the two conditions rejects the null hypothesis in favour of 
a greater failure to repeat under inappropriate keyword conditions at a probability 
level of P < 0-001. 

Failures to repeat under appropriate keyword conditions related in the main to 
non-specific words, the specific words having a relatively high level of intelligibility 
as indicated before. In the tabulation of subjects’ repetitions the appearance of these 
specific words and their associates with the appropriate keyword and their subsequent 
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disappearance when inappropriate conditions again obtained presents a most striking 
picture, only partially conveyed by sections (b) and (d) above. 


CONCLUSION 


Cherry (1957) has said, when discussing the process of recognition: “We may 
recognize a friend from a certain set of attributes of his appearance when met in 
familiar surroundings, but recognition may depend upon an utterly different set if we 
encounter him unexpectedly in a strange place (say a foreign city). We may even fail 
to recognize him, or we may reject the recognition as an unlikely hypothesis .. . . 
Being in certain familiar surroundings may be one of his essential attributes whereby 
we recognize him. A sign is not received isolated from an environment ; it is part 
of a whole complex situation ”. 

We have seen in this experiment the lack of recognition of “ old friends” when 
they appear out of context. It is remarkable that at the conclusion of testing only 
three of the twenty subjects declared a firm suspicion that they had heard the same 
keyword sentences on each occasion, and even they could not resist the influence of 
differential pre-setting. The other seventeen were alike in their surprise when 
informed of the repetition and the degree to which pre-setting affected their interpre- 
tations. In many cases the encouragement of expectations by a single spoken word 
was enough to ensure complete success or complete failure of sentence intelligibility. 
Where that failure took the form of erroneous interpretations, their source in the 
misleading keyword was often obvious, and they were frequently reported as heard. 
Substantial success on an appropriately prefaced sentence was rarely maintained when 
inappropriate conditions were subsequently encountered (with the exception of 
Sentence E), even though the subject might start the new interpretation with the same, 
correct words. Test words “hung together” in the subjects’ interpretation of the 
sentence so that, with the return to inappropriate conditions, not only specific words 
were lost, but their non-specific associates as well. Whole phrases appeared and 
disappeared as the testing passed from inappropriate to appropriate and back to 
inappropriate conditions again. 

In this, the experiment provides another confirmation of the inter-dependence of 
word signs. For speech perception, the recognition value of the immediate sign 
seems to be insignificant compared to the information coming from contextual sources. 
The particular interest of these tests is that the critical source lay outside the 
messages themselves. The syntactical and associative structure of the latter remained 
constant, but the situation, as established by a given keyword, changed. With 
consequent change in set went drastic alterations in the intelligibility of word signs. 
“ Being in certain familiar surroundings” was one of the essential attributes for 
recognition to occur. 

Grateful thanks are expressed to the Medical Research Council for a grant towards 
apparatus used in this project. 
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A NOTE ON THE INFORMATIVENESS OF PARTS 
OF WORDS 


JEROME S. BRUNER AND DONALD O’Dowp 
(Harvard University and Wesleyan University) 


The experiment reported in this paper was designed to determine which part of a 
printed word is most useful to a reader as a basis for recognition of the whole word. 
Versions of ninety common English nouns were prepared in which typographical 
reversals were inserted at the beginning, middle or end of the word. These were then 
presented in a tachistoscope to subjects who were asked to recognize the words. 

The results of the experiment showed that an error in the beginning of a word is 
significantly more disruptive of recognition than an error in the middle or the end 
of a word and that an error at the end is more disruptive than an error in the middle. 


If one takes the word AVIATION, a common enough eight-Jetter word, one may 
ask which part of it is most useful to the reader in giving him a basis for recognizing 
the word as a whole. The question is innocent enough, and one would immediately 
think in terms of the concept of constraint: that, approximately, each successive 
letter adds less information than its predecessor by virtue of the earlier letters 
having progressively constrained the number of alternative words that might bé made 
by the addition of later letters. Yet the sequential structure of words seems 
intuitively not to be precisely of this nature. For though the opening letters: of our 
sample word, AVIATION, provide notable constraints on what may folléw, the 
terminal ones do too, though to a more limited degree. Thus, AVI... may lead to 
a set of words, all deriving from a common root: AVIATOR, AVIARY, as well as 
to the indicia of tense, plurality, and the like. The least informative (Gr most 
redundant, or least useful) portion of words appears to be their “ middles ” — 
defining that term roughly, and we define it so for lack of a polished definition. 

We became curious as to whether indeed the identification of written words would 
reflect this apparent double-twig quality of English orthography. There appears to 
be ample evidence that in recognizing tachistoscopically presented words, subjects 
tend to pick up material on the left hand side of the fixation point first, and Mishkin 
and Forgays (cited in Hebb, 1949) have reported that Yiddish speaker-readers pick up 
material on the right first. And as early as forty years ago, Wagner (1918) ha@ shown 
that though the centre of a nonsense word be fixated, letters to the left of centre 
are more accurately recognized than those at the centre of fixation, and letters to the 
right next most accurately. Indeed, ten years before that, Wiegand (1908) found 
that as words were moved in from a distance toward the subject, the beginrgng and 
end letters were most quickly recognized. 

But we were not satisfied with these findings for all of them were too easily explicable 
in terms of habits of looking or, to use the current term, scanning. Two. proper 
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techniques for testing the informativeness of parts of words suggested themselves 
to us. One consisted of the use of words with omitted letters which subjects would 
be required to furnish. To use our sample word again : 

**IATION 

AVI**ION 

AVIATI** 
We have not employed this technique, though it is an obvious one. Our principal 
objection to it is that it requires the subject to supply letters, and our thought was te 
avoid this and use the literal elements of the word in our presentation. 

To put the rationale of our own method in a nutshell, we inquired where in a word 
would a typographical reversal of letters be most disruptive, granting the reader knew 
that such a reversal had occurred. By the use of this technique we felt we could 
avoid the known tendency of subjects to supply either common letters or common 
transitions in the blank spaces used in the more obvious method. Thus, the three 
forms of AVIATION to be correctly identified or “corrected” would be 

VAIATION 
AVITAION 
AVIATINO 
With such a technique, which part of a word would count most ? 


PROCEDURE 


The stimulus materials used were 90 common English nouns six to eight letters 
in length drawn from the Thorndike-Lorge (1952) Semantic Count within the 
frequency range from 50 to 150. Three different forms ‘of each of the 90 words were 
prepared, one with a transposition error at the beginning of the word, another with 
a middle transposition error, and a third with a transposition error at the end of the 
word.’ From the 90 words three lists were constructed. Each list was composed of 
the 90 different words. In every list there were 30 words with beginning reversals, 
30 words with middle and 30 with end reversals. In each list the error was in a 
different part of any particular word. The order of the words was the same in all 
three lists. The lists were so constructed that in every successive group of nine words 
three error words representing each of the three error positions were arranged in 
a random manner. 

Sixteen Harvard undergraduates served as Ss in this experiment. Each S was 
assigned at random to one of the three test lists. The 90 words from one list were 


! The beginning, middle and end of a word were defined as follows: for a six-letter word the 
first two letters were designated the beginning ; the middle two letters, the middle ; the last 
two letters, the end. The seven-letter words were divided in a two, two, three manner ; the last 
three letters constituted the end of the word. The eight-letter words were divided into the 
beginning two letters, the middle three and the end three. 
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presented to the S one at a time in a Gerbrands-Dodge tachistoscope at 50 ms., an 
exposure sufficient for ready recognition of the letters which were typed black on white 
in capital letters. The Ss were run individually. They were told that each word 
would contain a typographical reversal error and their task was to indicate what the 
correct word was, that is, the English word that had been altered by the typographical 
error. They were instructed to respond as quickly as possible, and their response 
latency was recorded on each presentation to the nearest hundredth of a second. 
On every presentation the S was alerted by a verbal signal just prior to the exposure. 
When the S failed to identify a word on the first trial, he was given a second, and 
if necessary a third exposure of the same duration. 

For each S, then, there are response latencies for 90 words containing letter reversals 
in their beginning, middle, and end portions. These are the primary data on which 
the analysis will be based. 


RESULTS 


The most general finding, which will be demonstrated in terms of two kinds of 
measure, was that an error in the beginning of a word is significantly more disruptive 
of recognition than an error in the middle or end of a word ; an error at the end of 
a word is somewhat more disruptive of identification than an error in the middle. 

One mode of analysis consisted in computing the median latency of response for 
each § and determining the proportion of beginning, middle, and end error words 
with latencies which fell above and below this median. If the recognizability of the 
three error forms were the same, then approximately one half, or 15, of the words 
would score below the individual medians. Instead one found on the average 7-9 of 
words with beginning reversals below median, 16-2 with end reversals, and 20-7 with 
middle reversals. . 

Three x° comparisons were made on the data from each S. The beginning error 
words were compared to the middle and end error words ; the middle error words 
were compared to the end words by means of a 2 X 2 comparison. The cells for all 
Ss on each comparison were pooled to yield three x° values. These were tested for 
evidence of heterogeneity as suggested by Snedecor (1946). Table I summarizes the 
analysis. The words with beginning reversals are clearly more difficult to recognize 
than those with either middle or end reversals. The middle words are markedly less 
difficult than the beginning words, and they appear to be somewhat less difficult than 
the end words, although this cannot be definitely stated due to the heterogeneity of 
the component ratios. 

A second way of analyzing the data is by comparison of the three forms of each 
word presented to different groups of Ss. Each of the 90 words was presented with 
errors in each of its three parts to three different groups. It is reasonable to ask 
what is the order of difficulty for the three versions of each word. If one gives a 
score of 1 for the form of reversal in a word that is least disruptive, 2 for the next 
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TABLe I 

Comparison - df p 
Beginning and Middle 176.048 aa < .001 

Interaction 4.043 15 > 99 
Beginning and End 75.024 1 < .001 

Interaction 11.514 15 > .70 
Middle and End 23.509 l < .001 

Interaction 33.179 15 < 01 

x” Values for Comparisons between Error Positions 
Pooled Data 


Interaction x’ also Indicated 


most disruptive, and 3 for the most disruptive, the score for beginning errors is 235, 
for end errors 175, and for middle errors 130. A reversal at the beginning is most 
disruptive, then, in 62 of 90 words. End reversals are most disruptive in 23 of 90 
words, and middle reversals in 5 of 90. There are six possible orders of difficulty : 
beginning-middle-end, beginning-end-middle, etc. If any of the six possible orders 
occurs by chance, then each order should occur in approximately 15 out of the 90 
words. Inspection reveals that the order of difficulty, beginning-end-middle occurs 
in 37 of the 90 words, a frequency sufficiently in excess of chance to be reliable at 
beyond the -001 level. The order beginning-middle-end also occurs in excess of 
chance, 25 of 90 words (p < -01). 


CONCLUSION 


The most evident conclusion is that the beginning of a word is more informative 
than the end, the end more than the middle. The scanning habits observed in the 
studies cited — picking up information on the left of the centre of a word first — are 
well justified not just in terms of following the sequential order of letters but also 
in going to the most useful place first. 
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THE EFFECT OF REDUNDANCY IN SHADOWING ONE OF 
TWO DICHOTIC MESSAGES 


NEVILLE MorAY AND ANNE TAYLOR 
(University of Oxford) 


Passages of prose approaching in varying degrees of statistical approximation to 
English were presented to subjects to shadow while normal prose which they were 
to disregard was presented to their other ear. It was found that the number of mistakes 
in shadowing varied inversely with the order of approximation to English, while the 
number of omissions varied inversely with the logarithm of the order of approximation. 
These results are discussed in the light of recent experiments on attention. 


INTRODUCTION 


Cherry (1953) reported that if dissimilar messages are presented simultaneously 
one to each ear of a subject (dichotically), and he is required to repeat out loud one 
of the messages as he hears it while disregarding the other, it is possible to perform 
this task with considerable efficiency. This task of repeating a continuous message 
out loud as it is heard will be referred to throughout this paper as “ shadowing ”. 
He also found that if two passages composed of political clichés were presented each 
one to both ears and the subject required to shadow one of them, he could not do 
this, his attention failing at the points where one cliché passed into another. The 
reason seemed to be that the continuity between the clichés was too slight for the 
subject to fix his attention on one message rather than the other. The present study 
was undertaken to investigate further the importance of contextual constraints in 
such a task, and to discover at what stage in the transition from normal to random 
prose the subject would become unable to shadow the message efficiently. 


METHOD 


Our method differed from Cherry’s in that the two messages, normal and approxi- 
mate English, were presented dichotically. 

The subjects were 26 undergraduates and research workers, some of whom had 
previous experience in two-channel listening, but all of whom were inexperienced at 
shadowing. 

Eight passages of 100 words were composed by the method of Miller and Selfridge 
(1950) : that is, people were asked to add one word to sentences of which they could 
see varying numbers of words previous to the one they had to add. In this way 
passages of Ist, 2nd, 4th, 6th, 8th, 12th and 16th order approximations to English 
were made. The expression “ order of approximation ” is due to Shannon (1949) and 
a 6th order approximation, for example, is one in which the subject can see the five 


— 
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words preceding that which he must add. A Ist order approximation is obtained by 
picking words at random from a book, in this case a novel, so that the words have 
no contextual constraints, but occur with about the same frequency as they do in 
everyday use. A zero order approximation would be one in which the words were 
picked at random from a dictionary. During the experiment it became desirable to 
have samples of 5th and 7th order prose, and these were taken from the lists of 
Miller and Selfridge (op. cit.), by combining their shorter passages at each order in 
such a way as to produce 100 word passages. Care was taken that the transition 
probabilities at the joining points were the same as for the rest of the passage. 

Subjects were given four practice trials with normal prose to both ears before the 
experiment. The material presented as the message which the subjects were to 
disregard was normal English prose taken from a light novel (“ Perelandra” by 
C. S. Lewis). All the passages were read in a monotone, and the two messages were 
balanced at approximately equal intensities. The lists were presented in order from 
the highest to the lowest order of approximation, so that any practice effort would 
reduce the errors in those passages we expected to be harder for the subjects One 
passage of normal prose was included as the last to be shadowed, so that any falling 
off in performance due to fatigue could be estimated ; any such effect was found to 
be negligible. 

The subjects were thus required to shadow approximations to English while normal 
prose which they were to disregard was presented to their other ear. The subjects’ 
responses were recorded and later analysed for omissions and mistakes. Fresh words 
introduced were counted as omissions, but these were so small in number that to have 
counted them as mistakes would have made no difference. 

The apparatus used was a Brenell Mark IV tape recorder modified to give two 
independent outputs. 


RESULTS 


Our results are analysed in Table I. 

The errors were estimated by counting the number of individual words that the 
subjects omitted or got wrong. Curves were fitted to the data, and the following 
equations were obtained as curves of best fit. 

Omissions = 26-7 — 21 logy (Order of approximation) 

Mistakes = 6-5 — 0-33 X (Order of approximation) 
Other curves were tested to see if they would also fit the data, in particular with a 
view to checking the mistakes equation, as we had predicted theoretically that both 
curves would be logarithmic. But the calculation of regression lines and analysis 
of variance showed that the equations given were those for best fit. Both plots depart 
from the regression lines less than would be expected at the 1% level of confidence, 
and the curve for the omissions is considerably better than this 

The graphs of the two equations are given in Figure 1. 
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Order of approximation to English 

1 2 “+ 5 6 7 8 12 16 
Mean omissions 30-7 13-5 15-8 15-8 15-0 11-8 26 68 43 
s.d, of mean 13-4 78 9:8 11-0 6-4 7:7 3-0 7-4 41 
Mean mistakes 8-1 5:7 5-1 5-0 2-6 41 2-9 2-6 2-5 
s.d, of mean 7:2 38 3-5 41 2:5 3-4 3-9 1:8 2-2 
Mean total error 38-8 19-2 20-9 18-8 7-6 15-9 5:5 9-4 68 
s.d. of mean 13-8 9-0 10-9 13-3 79 10-3 4-0 7-6 49 


DISCUSSION 


Our results show how important is redundancy in language for rapid comprehension 
and response. The method of preparing the material ensures that the prose is 
constrained progressively both with regard to logical structure (grammar and syntax) 
and vocabulary. It may be possible to separate the two effects, and preliminary results 
from another experiment (Taylor, unpublished) suggest that shadowing passages with 
correct syntax but random vocabulary is relatively easy. Related results were found 
by Oldfield and Zangwill (1938). 

Redundancy is closely tied to the order of approximation of the language, and this 
suggests that the omission rate is closely related to the amount of redundancy (the 
word is here used strictly in the information theory sense). This would relate our 
findings to those of Miller and Selfridge (1950), who used similar material in testing 
the effects of contextual constraints on recall. One difference is that we have found 
the effects of varying the constraints extending several degrees of approximation 
further than they did —to about 8th order as compared with about 4th or 5th. It 
seems impossible to get a measure of the absolute amount of information that is 
processed in the choice of the next word in any of our passages, since we do not 
know from how large a vocabulary those people who constructed them were drawing ; 
it is obvious that after talk about picnics in woods it ie unlikely that words from the 
language of nuclear physics will appear, but we do not know from how many words 
the choice is in fact being made. We assume that such passages do provide a 
measure of the relative amount of information with respect to one another, and that 
at least for the lower orders, redundancy is proportional to the logarithm of the order 
of approximation. A check on this was obtained by removing words at random 
from the passages and asking subjects to guess what the word was, a procedure found 
by Taylor (1956) to correlate highly with other measures of prose redundancy. 
Presumably when the number of words that a subject can see before adding his word 
(in the construction of these passages) exceeds that of the average length of an English 
sentence, syntactical redundancy must be nearly maximal. In this respect it is worth 
noting that the point at which the regression lines for both omissions and mistakes 
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would cut the abscissa, corresponding to zero errors, is at about 21st order, and the 
average length of an English sentence has been variously calculated as between 18 
and 28 words (Yule, 1944). In practice, of course, zero errors are hardly ever 
obtained. , 


Why should errors and redundancy be related in this way? If, as is suggested by 
‘ie work of Broadbent (1952), listening and speaking are sufficiently different for it 
to be necessary to switch attention to one activity at the expense of the other if an 
efficient response is to be made, then if the incoming message varies in some unusual 
way which makes it necessary for the subject to pay more attention to reception, 
output will suffer, and the number of omissions will rise. It may be that the 
redundancy of the message produces an effect analogous to persistence of vision in 
a cinema, in that it allows the “ gaps” in reception due to switching to be ‘illed in. 


The first requirement for successful shadowing is that the material should be 
perceived accurately ; and as Miller, Heise and Lichten (1951) have shown, the effect 
of context introducing redundancy in a perceptual task is considerable. It seems 
possible that incoming information (in the informal sense) is used to predict inductively 
what words will follow, thus making the task of reception easier by narrowing the 
class of possible words and syntactical structures from which an identification will have 
to be made. It is also possible that similarly a motor set might arise, limiting the 
probable responses that may have to be made, and making the output task easier. 
At lower orders of approximation, where redundancy is low, this set will have to be 
changed more often as the context changes over smaller groups of words, and this 
task will presumably be performed at the expense of the motor output. It is worth 
noting that to randomise words not merely introduces random order but actually may 
make the perceptual task harder than would be predicted, for randomisation may 
actually contradict learnt transition probabilities. 


It is clear that in shadowing the immediate memory span (i.m.s.), is important. 
Our subjects’ responses could be divided into two clearly distinguishable groups, 
which we shall call the “ continuous ” and “ discontinuous ” type of response. The 
former appeared to show no pauses in speaking longer than the pause between 
adjacent words, while the latter spoke groups of words with a silent period, extending 
over several words, between each output. At least at the lower orders of approxi- 
mation the continuous response was far more efficient, despite the interference that 
we might expect from Poulton’s (1958) findings. It should be noted that our 
conditions differed from his in that the material was presented continuously without 
pauses for the responses to be made. Some subjects volunteered the information that 
they changed from the discontinuous to the continuous form of response as the task 
became subjectively harder (presumably because accumulating words was giving them 
little contextual help). As the motor response lags behind the input the gap will 
come to exceed the i.m.s., and the short term memory traces will fade, so that there 
are more omissions and mistakes, the latter being sometimes in the form of synonyms. 
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This introduction of a synonym is readily accounted for if we assume that the words 
heard leave not only their own individual traces, which fade quickly, but also 2 more 
lasting modification of the “ set ” for what is to follow. 


We presented our material at a rate of about 2-5 words per second. It is interesting 
to note that if the regression line fitted to the omissions data is prolonged back to 
lst order approximation, the point at which it crosses the ordinate, given by the 
constant in the equation, is 26-7. If we take this to mean that when shadowing 
completely random words presented with the frequency of occurrence of normal 
English at 2-5 words a second about 27% will have to be omitted, it may be that 
this gives us a measure of the time that it takes to transfer attention from listening to 
speaking and back. To calculate in round numbers, 2-5 words per second is 150 words 
a minute, of which just over one quarter (25%) will be omitted. That is, about 115 
a minute could be repeated. This corresponds to just under 2 per second, giving a 
value for the switching rate that compares well with other studies in this field 
(Broadbent, 1952, Cherry, 1953, Schubert and Parker, 1955). At higher orders the 
message might be dealt with in “ chunks ”, to use Miller’s (1956) term, which would 
allow a slower rate of switching. Further research into this auditory-verbal response 
time is at present being conducted. 


Another point of interest is the difference between the omissions and mistakes. 
The former show a logarithmic relation to the order of approximation, the latter an 
arithmetical one. The number of mistakes is always much less than the number of 
omissions. This would seem to suggest that the subject makes no response at all 
rather than a wrong response, and that as the material becomes more random, and 
so harder to follow, this effect becomes more marked. This is in agreement with 
Broadbent’s discovery (1952) that in responding to simultaneous and successsive 
stimuli even a right response causes more interference with the task than no response 
at all, though probably less than a wrong response. Mistakes would alter the transition 
probabilities for succeeding words and would therefore make errors more likely, 
while merely not responding at all would not have any such effect. 


There remains to be explained the finding, not apparent from the data presented 
in Table I, that at lower orders the omissions tend to occur in increasingly long 
sequences, so that at 2nd order it is not uncommon to find ten words being missed 
consecutively. We do not feel that we have enough evidence as yet to speculate in 
detail as to the nature of this finding, but very many of the subjects remarked. even 
without being asked, that they had heard many more words than they spoke, but 
“just couldn’t get them out,” and there was a certain amount of vocalisation without 
producing any words during these blocks. The blocking seems to be a very compulsive 
effect, and produced far more interference than the material from the rejected message, 
of which few complained. This problem, as to the nature and properties of this block, 
deserves further research. There is also a certain amount of interference from the 
auditory and kinaesthetic monitoring, as against initiating, of speech. It is difficult 
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to assess how great is the claim of this upon the subject’s attention : Poulton (1955) 
found that his subjects were unable to shadow intelligibly without some auditory 
feedback, but other experiments have shown that with practised subjects it is possible 
to speak and shadow in the presence of completely masking tone (Cherry, 1955). 

Finally, one of the objects of the experiment was to see whether redundant and 
therefore more readily intelligible material, presented to the ear whose message the 
subject was trying to ignore, would interfere with shadowing when the contextual 
constraints in the shadowed message were slight. That is, would the subject tend 
to respond to the easier and more meaningful message. Some subjects did complain 
of the interference from the rejected message, particularly at very low orders, but 
they were able to overcome it and did not repeat words from the rejected message. 

Our experiment was inspired by Cherry’s work on the shadowing of simultaneously 
presented political clichés, but our subjects did not switch to the competing message 
due to the failures of continuity in the message, as did his. This is probably because 
we, unlike Cherry, used dichotic presentation. Although our material was difficult 
for subjects to deal with, they could, nevertheless, pay attention to the input from 
one ear rather than the other. 


CONCLUSIONS 


1. In a two-channel listening situation, as the closeness with which the material 
approximates to English falls, omissions in a shadowing task rise logarithmically, and 
the mistakes arithmetically. 

2. The logarithmic relation of omissions is what would be predicted by regarding 
the change in the order of approximation to English as causing variation in the amount 
of redundancy (in the sense in which the word is used in information theory). This 
variation is arithmetically related to the logarithm of the order of approximation to 
English. 

3. Omissions rise faster than mistakes with falling redundancy ; i.e. the subject 
makes no response rather than a wrong response. 

4. A continuous form of response is more efficient than a discontinuous form of 
response. 

5. Attention cannot be paid equally to reception and reproduction of speech if 
either is to be performed efficiently : we infer a switching of attention whose time 
constant agrees with that found in other recent studies of attention. 
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THE EFFECT OF THE RELATIVE INTENSITIES OF 
DICHOTIC MESSAGES IN SPEECH SHADOWING 


NEVILLE MORAY 
(University of Oxford) 


Prose messages were presented simultaneously and dichotically to a subject who 
was required to shadow one of them. Five different relative intensities were used. It 
is shown that once the two messages are of equal intensity there is little further 
improvement in shadowing performance if the message the subject accepts is made 
more intense than the message the subject rejects. Even when the accepted message 
is —5 db with respect to the rejected message there is no significant rise in errors 
when the number of mistakes is used as the criterion for assessing the subject’s 
accuracy of response. 


INTRODUCTION 


Cherry (1953) showed that if messages were presented dichotically to a subject who 
was required to repeat one of them while he was listening to it (“ shadowing ”), this 
task could be performed with considerable efficiency. The present study was under- 
taken as part of a series of experiments on the variables influencing attention in such 
studies. 


METHOD 


A Brennell Mark IV tape recorder was used, fitted for stereophonic recording but 
modified to give two independent outputs. Five prose passages were presented to 
each ear. These were light prose from a novel, and were so arranged that the intensity 
of the shadowed passage with respect to the rejected passage was —10, —5, equal, 
+5 or +10 db. All passages were spoken by a male voice. The subjects were all 
well practised by taking part in two other experiments in two-channel listening 
previous to the one here described. Subjects tend to be “ right-eared ” or “ left-eared ” 
in shadowing experiments and in the present study only the “ear of choice” was 
used for each subject. The subjects’ responses were recorded and later analysed for 
omissions and mistakes. There were 14 subjects, students and research workers, ages 


approximately 18 to 28. 


RESULTS 


Cherry reported that the relative intensities of the two messages need not be closely 
matched for the subject to be able to shadow one of them, and the present study 
bears this out. It might have been expected that making the shadowed message more 
intense than the rejected one would improve performance, but this is not the case. 
The limits of efficiency of the response are virtually reached when the two messages 
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are equally intense, although it does seem to be much easier for the subject when the 
shadowed message is more intense. If we take the difference between the mean 
number of omissions per hundred words when the two messages are equally intense 
and when the shadowed message is 10 db more intense than the other, and submit this 
difference to a ¢ test, we find that the difference is only just significant at the 0-05 
level of confidence ; while if the mistakes are analysed, even when the shadowed 
message is 5 db less intense than the rejected message there is no significant difference 
between this and the +10 db condition. 
These results are shown in Table I and Figure I. 


TABLE 1 
Mistakes Omissions 
Mean S.D. Mean S.D. 
Intensity of —10 6-2 1-0 20-0 10-5 
shadowed —5 2-6 2-0 14-1 8-1 
message 0 18 1-3 5-1 3-8 
relative to +5 1-2 18 4-7 6-6 
rejected message +10 10 1-1 3-0 3-5 
DISCUSSION 


Throughout the range there are more omissions than mistakes, a finding which 
agrees with other recent work in this field (Moray and Taylor, 1958). Peters (1956) 
did not find any significant rise in errors even at the —10 db condition of signal/noise 
ratio, but his experimental design was rather different, in that the subjects had to 
recognize a signal when both signal and noise were presented to both ears : probably 
the difference means that the task the subjects had to perform was easier overall in 
his experiment. 

With regard to the small improvement with the increasing intensity of the shadowed 
message when it is more intense than the rejected message, the findings of Schubert 
and Parker are of interest (1953). They showed that if the successive words of a 
message are sent to opposite ears, thus : 

Right ear: The sat the 

Left ear: cat on mat. 
the accuracy of reports of the content of the message is increased if the blank periods 
of each ear are filled with white noise. They suggest that there is an active inhibiting 
mechanism, and that performance is improved if the subject listens not merely to 
one ear, but actually away from the other. If this is so, it is not surprising that the 
limits of performance are reached when the intensities of the two messages are-not 
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greatly different. The present study confirms the impression given by all the work in 
this field, that the efficiency of the blocking mechanisms in attention is very great. 


The chief importance of the present study is to show that in two-channel dichotic 
studies slight differences of intensity in the messages are not likely to introduce spurious 
results by unduly favouring the reception of one message. It is easy to match messages 
in such studies to within + 1 db, using subjective judgments by the subject, and if 
the 1% level of confidence is used in establishing results, there will be only 
insignificant error introduced into such studies by the intensity differences. 





N. Moray 113 


Only the “ ear of choice ” was used in the present study, as it was desired to study 
the most efficient responses, but observations from practice sessions and other 
experiments suggest that for the other ear, although initially the number of both 
omissions and mistakes would be rather higher, indicating a less efficient direction of 
attention, the level of performance would increase very rapidly with practice and would 
come to be not very greatly inferior to that of the ear which the subject favours 
initially. 


CONCLUSIONS 


1. In shadowing one of two dichotically presented prose passages the number of 
mistakes per hundred words is always considerably less than the number of omissions. 

2. Once the two messages are equally intense there is practically no improvement 
in shadowing found if the shadowed message is made more intense than the rejected 
message, if the number of words omitted is used to measure the efficiency of the 
performance. 

3. If the number of mistakes is used as the score, there is no significant rise in 
errors even when the shadowed message is —5 db with respect to the rejected message. 

4. In studies of dichotic listening, it is sufficient to match the intensity of the two 
messages within + 1 db if the 1% level of confidence is used in analysis. 
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THE SYNTHESIS OF ENGLISH VOWELS 


G. F. ARNOLD, P. Denes, A. C. GIMSON, J. D. O’CONNOR AND J. L. M. Trim 
(University College, London) 


The construction and operation of a new speech synthesizer is described and an 
account is given of experiments designed to establish data for the successful synthesis 
of English vowels. 


During the last decade or two the chief practical aim of experimental investigations 
of speech has been the design of improved methods of telephone transmission. 
Numerous experiments have been carried out to throw further light on the relationship 
between the phoneme sequences of a linguistic text and the characteristics of the 
resulting acoustic wave and to find the rules for generating sound waves that will be 
recognized as a particular speech sequence. Most of the experiments have involved 
the use of mechanical speech recognizers and speech synthesizers of various designs. 
Investigations with a particular type of mechanical speech recognizer have been going 
on for some years at University College (Fry and Denes, 1953, 1955, 1957, 1958) and 
it was recently considered that these experiments might usefully be supplemented by 
work involving speech synthesis. 

There were at least two reasons for this decision. The speech recognition 
experiments were sufficiently advanced for us to consider seriously whether a phonemic 
analysis-synthesis speech transmission system could be realized by making the output 
of the mechanical speech recognizer control a speech synthesizer adapted to a 
phonemic input. The second reason for embarking on experiments in speech 
synthesis was that experience in this direction might also contribute to our continuing 
experiments in mechanical speech recognition. The central aim, then, of the proposed 
experiments was the design of a “ phonemic synthesizer ”, that is a sound generator 
that would accept an input consisting of a sequence of discrete phonemic signals and 
convert it into a recognizable speech sound wave. The operating rules for such a 
synthesizer are not yet known and the main task is to formulate such rules. A major 
problem in this connection is that of instructing the synthesizer to make the adjust- 
ments necessary to produce the right transition from sound to sound, transitions that 
appear to carry so much of the intelligibility of speech. 

With this aim in view, the question of the most suitable design for a synthesizer had 
to be considered. Any speech synthesizer consists of two parts, first, the actual sound 
generating circuits and second, the arrangement for controlling the generating circuits 
and determining the acoustic quality of the sound sequence produced. 

As far as the sound generating circuits themselves were concerned, there was a 
choice of several well-tried designs, all of the formant generating terminal analogue 
type and of the type in which the characteristics of the sound wave generated could 
be controlled by suitable voltages. Of these, the analogue computer type circuits used 
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in the synthesizer developed by Lawrence were adopted. The choice was made partly 
for electro-acoustic reasons and partly on grounds of convenience. The generating 
circuits include a larynx tone generator of controllable frequency. The output of this 
generator is applied through three separately adjustable amplitude control devices 
to three separate formant generating filters where the formant frequency of each 
circuit can be adjusted. There is a white noise generating circuit whose output is 
applied through an adjustable amplitude control to a circuit similar to the formant 
generating circuits but having a wider pass-band and a higher frequency range than 
the other three formant generators. In this way a band of noise with an adjustable 
centre-frequency can be produced. The outputs of all these circuits are added in 
parallel and the result presented through a loudspeaker. It is not intended to enter 
into the series-parallel controversy at this stage, but it is as well to mention that we 
have preferred to have available separate adjustment of each formant amplitude and 
that economy in the number of controls was not an important consideration. The 
limiting values of adjustment of the generating circuits will be given later on in this 
paper, after the synthesizer as a whole has been described. 

Turning now to the question of the design of the control arrangement, the decisions 
had to be made from the point of view of what would be most suitable for the 
experiments to be carried out, rather than by considering only electronic and 
constructional convenience. Although most existing synthesizers are controlled by 
painted patterns in some form or other, it was thought that a control device similar 
to that of the Haskins Laboratories “ Octopus ” would be preferable for our purposes. 
In the “ Octopus ” type of control a number of potentiometer knobs are provided to 
set the various characteristics, such as the fundamental frequency and the formant 
frequencies, of each sound segment. There are as many of these sets of potentiometers 
as there are sound segments to be generated in succession and it is arranged that on 
pressing a button some kind of switching device will scan these control settings and 
produce the corresponding succession of sound qualities. As the principal aim of 
the experiments is the generation of a continuous speech sequence from a discrete 
input, the Octopus type control seemed most suitable, as in this way a series of fixed 
points could be set in the synthesized sequence with a more or less stereotyped form 
of transition. The form of the transition can be changed easily by substituting different 
circuits between the control potentiometers and the synthesizing circuits. Also if 
the control potentiometers are mounted on a suitable panel, the pattern of control 
settings can be inspected reasonably easily. 

The synthesizer in its present form is rack-mounted. The lower half of the rack 
carries the generating circuits and the matrix of potentiometer knobs on the upper 
part of the rack is the control panel. All the knobs controlling the quality of one 
segment are arranged in one column. There are ten columns and therefore ten sound 
segments can be generated in succession. There are twelve knobs in any one column. 
The top two knobs control the duration of the sound segment. The knob below 
controls the larynx frequency. The next six knobs control the frequencies and 
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amplitudes of the three formants and the next two the centre frequency and amplitude 
of the band of noise. The last knob in the column determines the rate of transition 
of the control voltages from the values determined by the settings of one column to 
those of the next. The scanning from column to column is performed by an electronic 
switch initiated by either one of two switches. The operation of one of the switches 
will result in the repeated scanning of the columns from left to right until the switch 
is re-set. A meter on the front panel can be switched to measure the output voltage of 
any of the potentiometers and graphs have been prepared to enable the experimenter 
to convert these voltage readings to frequencies and amplitudes as appropriate. 


The ranges of adjustment of the various controls are as follows : 
Duration of each segment: 13 msec. - 370 msec. 


Larynx frequency: 65 c.p.s.-340 c.p.s. Range 1 
150 c.p.s. - 680 c.p.s. Range 2 


Formant frequencies : 
First formant : 170 c.p.s.- 980 c.p.s. 
Second formant:. 470 c.p.s.-2200 c.p.s. 
Third formant : 1500 c.p.s. - 5000 c.p.s. 
Noise band range: 1200 c.p.s.-5700 c.p.s. 


Formant bandwidths : 
First formant : about 100 c.p.s. 
Second formant: about 110 c.p.s. 
Third formant: about 140 c.p.s. 
Noise band : about 650 c.p.s. 


Transition time constants : 
(RC value of network between control potentiometers and generating 
circuits) : 1 msec. - 100 msec. 


Formant and noise amplitude control range : 60 db. 


When the construction of the synthesizer was completed, it was tested to sec if its 
actual performance followed the broad outlines of the design. A sentence was uttered 
by a human speaker and analysed on a Sonagraph. The controls of the synthesizer 
were then set to produce a similar acoustic sequence. A comparison of sonagrams 
of the natural speech and of the synthesized version showed that on the whole the 
synthesizer is well able to produce the required transitions although the typical minor 
fluctuations of the acoustic characteristics of the humanly produced speech are missing 
from the synthesized version. It may be of interest to note that the synthesizer 
sentence was readily recognizable by listeners who had no advance information about 


what to expect. 


The first experiments with the synthesizer were concerned with the vowel sounds. 
From the present state of our knowledge of the acoustics of speech, it is clear that 
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the rules for synthesizing consonants will have to take into consideration the nature of 
the neighbouring vowels (and consonants) and consonant experiments will therefore 
have to be carried out by synthesizing syllables. For this to be possible, a framework 
of vowels is necessary and the first experiments, therefore, were concerned with 
finding the formant frequencies and amplitudes and, as it turned out, the sound 
durations for a suitable set of vowel sounds. 


The first question to be answered was the number of vowels to be included in our 
tests. Southern British English’ may be taken to have 20 vowels, of which twelve 
are monophthongs and eight diphthongs. There were two conflicting requirements to 
consider when deciding what proportion of these was to be included in the tests. 
Each of the acoustic configurations to be considered in the later consonant experiments 
would have to be tried with every vowel and it would be an obvious advantage to keep 
the number of vowels small, so as to keep the number of acoustic patterns to be tried 
to a reasonably low figure. On the other hand, if the results of these tests were 
eventually to be used for synthesizing any English text, then all English vowels would 
have to be provided. It was thought that it might be possible to consider the 20 
English vowels as vocalic units, some consisting of sequences of a smaller number of 
sound elements. For instance, “long” vowels and the diphthongs might perhaps be 
treated as the sequence of two elements, each of which exists as a simple vowel. The 
additional advantage of such an arrangement would be that an eventual phonemic 
synthesizer would require a smaller store of basic acoustic patterns. 


Throughout the experiments described in this paper, the test sounds were never 
switched on for long periods of time, but were heard always for durations similar to 
those used when vowels are produced in isolation by human speakers. In English 
the average duration of the vowels changes from sound to sound. After some 
preliminary tests carried out by the experimenters themselves it was decided to make 
the duration of all the vowel elements 135 msec. The vowels themselves were then 
made up from these elements by (1) combining two different elements to produce a 
diphthong, (2) by doubling the length of an element to produce a long vowel, or (3) 
by using the element by itself to produce a short vowel. The first two categories 
therefore produced vowels of 270 msec. duration and the third a vowel of 135 msec. 
duration. 

Another decision that was made at the start of the experiments was to use one 


fundamental frequency only. The frequency selected was 150 c.p.s. which is approxi- 
mately the mean value of the male fundamental frequency range. Additional 


1 The pronunciation referred to is approximately equivalent to that described as Received 
Pronunciation in: Daniel Fones, An English Pronouncing Dictionary (11th ed., 1956). The 20 
vowels are those listed on p. xlii, with the exception of /99/ which is no longer differentiated 
from /5/ by the great majority of speakers. The symbols used in this paper are, however, those 
occurring in Ida Ward, The Phonetics of English (4th ed., 1950). 
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TABLE 1 
Vowel Vowel elements used 
i it+i 
I I 
€ € 
#2 #2 
a D+ D 
D D 
fs) 9+ 9 
U u 
u utu 
A A 
3 3+ 3 
3 3 
el e+ I 
ou 3+ u 
al A+ I 
au At+u 
re) 9+ I 
19 I+ A 
£30 2+ta 
ud uta 


experiments would have to be made later to see whether the formant results obtained 
with this value of fundamental frequency also applied for other values. Having fixed 
the duration and the fundamental frequency, the next task was to arrive at the best 
possible formant structure for the vowel elements. The experimenters listened to 
the sound qualities produced and made adjustments of formant frequencies and 
amplitudes until they agreed that the best patterns had been selected. The Peterson 
and Barney (1952) results on formants were also taken into consideration, but it was 
found that considerable deviations from these values had to be made. A discussion 
of possible reasons for this will be found in a later section of this paper. It was also 
found that the formant pattern of a vowel element often had to be changed 
considerably from the adjustment that gave the most recognizable results when the 
element was used alone to form a vowel. This change from optimum to a compromise 
setting had to be made so that the quality of the vowel element should be acceptable 
when used in combination with another element to form a diphthong. Finally, the 
following ten vowel elements were arrived at: i, I, a, &, D, 9, U, u, A, 3. They were 
used alone or in combination, as shown in Table 1, to form the 20 English vowels. 
It has already been said that the duration of the vowels was 135 or 270 msecs. 
The time constant of the transition from one element to another when they were used 
in combination was 100 msec. The formant frequencies of the 10 elements are shown 


in Table 2. 


— 


G. F. Arnold, P. Denes et al. 119 
TABLE 2 
1 I e€ & a D 0] uU u A 3 
Formant 1 
c.p.8 
A, 230 340 640 1000 720 520 170 270 ~=©1000 540 
B 315 380 570 940 720 720 620 250 500 960 550 
P-B 270 390 530 660 730 570 — 300 440 640 490 
Formant 2 
c.p.s. 
A, 3050 3100 3000 2400 — 1000 2950 720 720 1750 1900 
B. 2200 2200 2100 1750 + # 1150 980 920 800 1150 1375 1475 
P-B. 2290 1990 1840 1720 1090 840 — 870 1020 1190 1350 
Formant 3 
c.p.s. 
A. 4300 3250 5000 2800 — 2500 — 4500 — 2750 3200 
P-B. 3010 2550 2480 2410 2440 2410 — 2240 2240 2390 1690 
B. 3300 2700 3000 2700 2450 2400 _ — — 2450 2500 
Values of first, second and third formant frequencies used in the experiments. 
A. Formant frequencies of vowel elements used in the tests of Table 3. 


B. Final values used in the tests of Table 5. 
P-B. Peterson and Barney (1952) data. 


Each of the 20 vowel patterns was synthesized four times and recorded on tape. 
The recordings of these 80 sounds were arranged in a random order by tape splicing. 
The tape was played to a group of 21 listeners, all of whom had received adequate 
training in the use of phonetic notation.’ They were told that all the sounds they 
would hear would be English vowel sounds in isolation and they were asked to listen 
to the test sounds and for each sound to write down the appropriate phonetic symbol. 
Before the test started, they were asked to listen to one example of each of the 
synthesized vowels and were told which vowel each example represented. 

The result of the test is shown in the confusion matrix in Table 3. It will be seen 
that the percentage of correct recognitions for the diphthongs was rather high ; the 
lowest score was for /ou/ with 74% correct recognitions ; there were three diphthongs 


1 Other listeners with no previous phonetic training were also tried. For these listeners the 
test was preceded by an explanation of the multiplicity of spellings used for the same English 
vowel sound. They were given key words for each vowel and were asked to write down typical 
spellings for the vowels they recognized, e.g. to write down ee for /i/, oo for /u/, etc. The key 
words and the symbols to be used in the test were in front of the subjects throughout the test. 
Despite these precautions, the results showed that the subjects made many mistakes which were 
obviously due to writing down the wrong symbol. Better results might perhaps be obtained 
with naive subjects by asking them to write down words (freely or from a list of key words) 
containing the sound in question. In any case, the situation might not be so serious for 
consonant tests, as English spelling is much more consistent for consonants than for vowels. 
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TABLE 3. 


Response 


izre#fravnrodwuwua 3 8 eLou at a OI Ia & va 
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with scores between 80% and 90% and four with scores between 90% and 100%. 
It was to be expected that the diphthongs would be recognized more easily as they 
have two qualitatively different elements. The perception of a change of quality 
differentiates the diphthong from either of its elements, and places it within a 
restricted eight-member system. Furthermore, after recognizing one element the 
listener is helped by the strong sequential constraints involved. 

Among the pure vowels there are scores greater than 80% for /i, 1, €, p, u, u/ 
and 50% or less for /a, 9, A, 9, 3/. This shows that with the single exception of /n/ 
the close vowels were recognized well whilst open and centralized vowels fared badly. 
Additional experiments were carried out to determine whether the order of 
presentation affected the results in any way, because it was found that despite 
randomization all occurrences of some of the sounds were in the first half of the 
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test and of other sounds in the second half. The items were presented in a different 
random order, but although the results were somewhat different they were not 
significantly so. It was also thought that perhaps some of the sounds were more 
difficult to recognize because, whilst in the test all the vowels were heard in isolation, 
in fact not all of them occur in isolation in English. For instance, sounds like /a/ 
and /5/ and also most of the diphthongs occur in isolation as English words, whilst 
other sounds like /a/ or /2/ do not and may therefore be less readily recognizable 
when heard in isolation during a listening test. This led us to ask the question whether 
there was a difference in the recognizability of the various vowels in this kind of test, 
for reasons unconnected with the fact that the vowels were synthesized and not 
naturally produced. A further test was therefore carried out, using humanly 
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produced instead of synthesized speech. Just as in the previous test, four examples of 
each of the 20 vowels as produced by a human speaker were recorded, randomized 
and presented to a group of 29 listeners. The results are shown in the confusion 
matrix of Table 4. It will be seen that most of the scores are much too near the 
100% mark to show anything of the relative difficulty of the various sounds. There 
are however some conclusions that might still be drawn from the results. By far 
the lowest score was obtained with /a/ (51%) and an almost identically low score 
was obtained for this vowel in the test with synthesized vowels. This is possibly due 
to the fact that the quality of the /a/ vowel varies considerably from one English 
speaker to another. For this sound at least, then, a low score may not be due to 
imperfect synthesis alone. 


On reviewing the results shown in Table 3 for the synthesized vowels, it was decided 
that the scores obtained for many of the sounds were not high enough to be satis- 
factory. To a large extent this was probably due to the compromises that were 
necessary to make the vowel elements usable both in isolation and in combination with 
other elements to form diphthongs. For this reason it was decided to abandon the 
vowel element approach and to synthesize each vowel separately. For reasons of 
economy, it was also decided to deal with the following eleven pure vowels only : 
/i, 1, €, @, A, D, 9, U, u, A, 3/. When trying to determine the optimum formant structure 
for these eleven vowels, the Peterson and Barney data were again tried first. These 
values did not give very satisfactory results and the following causes will probably 
have contributed to this: (1) The Peterson and Barney results were derived from 
American English, whilst the present tests are carried out with reference to British 
English. For some sounds, like /p, 9, A, 3/ for instance, this would make a 
considerable difference. (2) The Peterson and Barney data are average values and 
it is by no means certain that the relationship between the values of the different 
formants of any one speaker are the same as for the average values. (3) It is quite 
possible that when speech is synthesized from only three formants by the kind of 
terminal analogue methods used, the formant values for the most satisfactory sound 
qualities are somewhat different from those of naturally produced speech. 


For these reasons, and also because no data similar to those of Peterson and Barney 
are available for British English, it was decided to prepare sonagrams of the vowels 
of one speaker. The formant frequencies and amplitudes of these vowels were 
measured and the synthesizer controls set to produce the same formant values. The 
resulting vowels sounded so good to the experimenters that it was decided to record 
them and test them with a group of listeners. Again four examples of each vowel, 
that is 44 sounds in all, were recorded and randomized. It should be stated that at 
this stage the duration of the vowels was altered also. In the previous tests the 
durations were 135 msecs. and 270 msecs. for the short and long vowels respectively. 
In view of certain confusions between short and long vowels in the previous tests, 
the duration of the vowels in the present series was fixed at 220 msecs. for /I, ©, 2, 


- 
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TABLE 5. 
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D, A, u/ and 400 msecs. for /i, a, 5, u, 3/. The randomized sequence of vowels was 
presented to a group of 31 listeners who were able to write down the phonetic symbols 
for the vowels they recognized. As before, the subjects were told that all the sounds 
they were about to hear were English vowels. Before the test proper started, the 
subjects were allowed to hear each of the eleven synthesized vowels once and each 
was identified for them. 

The results obtained were much better than in the previous tests.’ Only 


1 One point that might be raised is that the scores in this last test were higher because there 
were only eleven different sounds instead of 20 as in the previous tests, where diphthongs were 
also used. This may well be so, but at the same time it is worth noting that in Table 3 the list 
of errors shows that diphthongs were rarely recognized as pure vowels and vice versa. 
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TABLE 6 

i I e 2 a D A) u u A 3 
F.1 (c.p.s.) 315 380 570 940 720 720 620 500 250 960 550 
F.2 2200 2200 2100 1750 ~+=# 1150 980 920 1150 800 1750 1900 
F3 3300 2700 3000 2700 2450 2400 2450 2500 
A.l. (db.) 0 0 0 0 0 0 0 0 0 0 0 
A.2 aan —2 05 -—05 -—3 2 -6 —5 aS al —4 
A3 —8 —s —6 5 —14 —16 —14 —10 


Formant values for synthesized vowels used in final test. These values 
originate from the measurement of the formants of a single human speaker. 


/1, €, u, A/ scored less than 85%. It is reasonable to assume that the somewhat 
lower scores obtained with /u/ and /a/ were not produced by faults in synthesis, 
since they were recognized least successfully in a previous test in which humanly 
produced vowel sounds were used. The formants of /1/ and /e/, however, were 
changed from the values derived from analysis until the experimenters judged them 
to be more clearly distinguished from the other sounds. Fresh tests were carried 
out in which the modified formant values were used for /1/ and /e/ whilst the formant 
values derived from analysis were used for all other vowels. The values of the formant 
frequencies and amplitudes are shown in Tables 2 and 6. The results of the test are 
shown in Table 5. It will be seen that these results are comparable with those 
obtained from tests with humanly produced speech sounds (Table 4) and therefore 
it was reasonable to adopt these formant values for the consonant experiments in the 
future. The last series of experiments also showed that a successful pattern of 
future experimentation in this field might be to analyse the speech sounds produced 
by a human speaker and to synthesize their salient characteristics. Experience shows 
however that simply using values derived from humanly produced speech in formant 
synthesizers does not give the most intelligible results in all cases and articulation 
tests will show where modifications are necessary and the extent to which these are 
successful. 


To sum up, then, these first tests carried out with the new synthesizer produced a 
set of values for synthesizing vowels of qualities readily recognizable by listeners. 
These vowels will be used in the consonant tests which will follow, in which the rules 
for synthesizing the consonants in syllables and words in combination with these 
vowels will be studied. 


The authors wish to express their thanks to Miss S. M. Fitz-Gibbon for her 
assistance in carrying out some of the tests described in this paper and in treating 
the data. 


— 


r 


a- 


- 


a- 





G. F. Arneld, P. Denes et al. 125 


REFERENCES 


Fry, D. B. and Denes, P. (1953). Mechanical speech recognition, in Communication Theory 
(London), 426. 

Fry, D. B. and Dengs, P. (1955). Experiments in mechanical speech recognition, in Information 
Theory (London), 206. 

Fry, D. B. and DENgs, P. (1957). On presenting the output of a mechanical speech recognizer. 
F. acoust, Soc. Amer., 29, 364. 

Fry, D. B. and Dengs, P. (1958). The solution of some fundamental problems in mechanical 
speech recognition. Language and Speech, 1, 35. 

PETERSON, G. E. and BARNEY, H. L. (1952). Control methods used in a study of the vowels. 
F. acoust. Soc. Amer., 24, 175, 








126 


EXPERIMENTS IN THE PERCEPTION OF STRESS 


D. B. Fry 
(University College, London) 


Differences of stress are perceived by the listener as variations in a complex pattern 
bounded by four psychological dimensions : length, loudness, pitch and quality. 
The physical correlates of these perceptual factors are the duration, intensity, 
fundamental frequency and formart structure of the speech sound waves. Experiments 
have been made in order to measure the effect on stress judgments of changes in three 
of these physical dimensions, duration, intensity and fundamental frequency. The 
experimental method was to synthesize speech stimuli in which these quantities could 
be controlled and varied over a considerable range and to use this material to construct 
listening tests which were carried out by large groups of subjects. 

English word-pairs of the type subject, object, digest, formed the language material 
for the tests; in the first experiment variations in duration were combined with 
variations in intensity. The results showed that both duration and intensity act as cues 
in stress judgments ; duration produced the greater overall fluctuation in the judgments 
and a method is suggested of making a quantitative comparison of the effect of the 
two cues. 

The second experiment combined duration changes with step changes of fundamental 
frequency. The results showed that the direction of a step change of frequency had 
a strong influence on stress judgments but the magnitude of the frequency change had 
no marked effect. The tendency was for a higher syllable to be heard as stressed in 
preference to a lower one. 

The third test included variations in fundamental frequency within one syllable and 
contained a range of patterns which imposed sentence intonation on the test items. 
The results again demonstrated the all-or-none effect of frequency changes and showed 
that this may outweigh the duration cue altogether. 


STRESS AS A TERM IN A DESCRIPTIVE SYSTEM 


A number of the terms used in descriptive linguistics refer to events that occur at 
different levels and at different stages in the process of speech communication. One 
such term is “ stress” which generally denotes both an aspect of the articulatory or 
motor side of speech and also a feature of the sounds perceived by a listener. Part 
of the usefulness of the term to linguistic description lies in the very fact that it spans 
both the transmission and the reception phase of speech, but its use sometimes forms 
the basis for the unjustifiable assumption of a one-to-one correlation between trans- 
mission and reception in this particular domain. Writers on phonetics and linguistics 
generally use “ stress ” to denote either “ the degree of force with which a syllable is 
uttered ” (Jones, 1949) or “ degree of loudness ” (Bloch and Trager, 1942), but it is 
often implied or explicitly stated that these two things are completely correlated ; 
Bloomfield (1933), for example, says that stress “consists in speaking one of these 
syllables louder than the other or others ”, 
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An experimental approach to the problem of stress requires somewhat more rigorous 
formulation of the uses of the term and of the types of event that are open to 
investigation. In the common usage, succeeding parts of an utterance are said to 
bear stronger or weaker stress in comparison with other parts of the utterance, and 
normally the parts so characterized are syllables. Hence stress is a term that refers 
to a relation between syllables and successive variations in this relation constitute the 
rhythmic pattern of an utterance just as successive variations in tone-relations make 
up the intonation pattern. The rhythmic pattern plays a very important role in 
English and the work reported in this paper deals only with examples drawn from 
this language. 


Stress as a descriptive term may, then, on the one hand, refer to features of the 
skilled movements that constitute the transmission side of speech and it may be 
possible to devise experimental methods of measuring variations in the force of 
utterance during a speech sequence. If this were done, it is likely that the variations 
would be seen to be more closely connected with phonation than with articulation 
and it is unlikely that there would be an exact correlation between degrees of stress 
in the linguistic sense and measured force of utterance. On the other hand, stress 
may refer to the reception of speech and in this case it denotes a complex of perceptual 
dimensions. 


A sound stimulus may be varied along several physical dimensions, and such 
variations, provided they fall within certain ranges, will give rise to changes in basic 
psychological dimensions: pitch, loudness, quality and length. These are basic 
dimensions in the sense of being independently variable. It is possible to present to 
a listener sounds which he will judge to be different in pitch but the same loudness, 
quality and length, or different in quality, but the same pitch, loudness and length, and 
so on. In addition to these basic perceptual dimensions, there are others which 
constitute in effect a complex of these first four. Thus in psycho-physical experiments 
on the “volume” or on the “density” of sounds, the listener operates with a 
complex of the simple dimensions. 


Perception of the sounds of speech always involves a complex of these dimensions ; 
the listener is never concerned exclusively with one of them. He takes in continuous 
variations along all of the basic dimensions and his linguistic judgments are 
determined by their interaction. This fact is only one more illustration of the 
redundant character of speech as a mode of communication. The listener, in normal 
conditions, has a number of cues that he can use as the basis of any single judgment 
and these cues are provided by variations in any and all of the perceptual dimensions. 
On the other hand, the listener may, for a specific judgment, be more dependent 
on one than on another : in establishing a phonemic sequence, he may depend very 
largely on succeeding variations in quality ; in taking in an intonation pattern, he 
may commonly rely mainly on variations in pitch. 
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PERCEPTUAL FACTORS IN STRESS JUDGMENTS 


In the case of stress judgments, even in one particular language, all four dimensions 
may play a part and this accounts to some extent for the difficulty of defining the 
term and for the occurrence in descriptive linguistics of terms such as “ pitch accent ”, 
“ force accent ”, etc., which are used to denote the supremacy of one dimension over 
the others in specific circumstances. If we consider the stress patterns of English in 
perceptual terms, there are a number of factors that influence a judgment of stress. 
The listener relies on differences in (1) the length of syllables, (2) the loudness of 
syllables, (3) the pitch of syllables, (4) the sound qualities occurring in the syllables 
and (5) in the kinaesthetic memories associated with his own production of the 
syllables he is receiving. 


These factors form a complex in which no one is independent of the others. Thus 
a stress judgment may be influenced by the length of a syllable, and particularly by 
the length of the vowel that it contains, but not independently of the vowel quality. 
In the English word /mo:bid/ the first syllable is perceived as stressed, partly because 
the first vowel is long. This vowel is, however, long in opposition to the first vowel 
of /mo:biditi/ and not in contrast with the second vowel /i/, for in the latter word, 
the first vowel is still long in contrast with the second, although the stress is now 
perceived to be on the second syllable. 


Certain quality differences in English have particular significance in stress 
judgments. The substitution of the neutral vowel /2/ for some other vowel, the 
reduction of a diphthong to a pure vowel, or the centralization of a vowel are all 
powerful cues in the judgment of stress. Some features of consonant quality, such as 
the strength of friction or aspiration and the sharpness of onset of the consonant 
sound may act in a similar way. 


It has sometimes been denied that the listener’s kinaesthetic memories of his own 
speech can play any part in the reception of speech. Experimental demonstration of 
the operation of this factor may indeed be difficult though it might possibly be 
achieved. The arguments generally advanced against this view are however largely 
irrelevant. Thus it is said that a listener may be able to understand speech in a 
language of which he cannot himself utter a word, and this is taken to “ prove ” that 
kinaesthetic patterns contribute nothing to the reception of speech. This, however, is 
merely to assume an identity between two statements, (1) that kinaesthetic patterns 
contribute to the reception of speech, and (2) that kinaesthetic patterns are essential 
for the reception of speech. The second of these is, of course, quite unjustifiable and 
is indeed contrary to the whole character of speech as a mode of communication. 
The redundancy of speech has been demonstrated in a number of ways ; it is important 
to realize that it is to be found at every level of speech activity and as a consequence 
there is scarcely any feature which can be said to be essential for speech communica- 
tion. A system that is common to the speaker and the listener and a time pattern 
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of change in the medium of communication are indeed the only two factors that can 
be regarded as essential. For the rest, speech consists of features that sub-serve these 
requirements and operate in combinations that depend upon the conditions of the 
moment. The purpose of experimental work is to explore these combinations and to 
study their relation to the conditions in which they occur. In ordinary working, and 
particularly in the case of a listener receiving his native language, it is probable that 
the listener’s kinaesthetic memories play some part in his reception of speech. If 
this is so, it is likely that the contribution will be particularly strong in the case of 
stress judgments since rhythm of all kinds has a powerful motor component. 


PHYSICAL CORRELATES OF THE PERCEPTUAL FACTORS 


In order to experiment with judgments of stress it is necessary first to determine 
the physical dimensions of the specch stimulus that we may expect to be closely 
correlated with the perceptual factors. We have already said that the influence of 
the listener’s kinaesthetic images is not directly accessible to experimental investigation 
but the other four factors can be assigned physical correlates reliably from established 
experimental data. This does not, of course, mean that there is in any case a 
one-to-one correlation between the stimulus dimension and the perceptual effect. 

The length of sounds will be closely correlated with the duration of given sections 
of the speech wave-form. Differences in loudness will be associated in part with the 
intensity of the speech wave-motions and this in turn will depend upon the frequency 
complex or formant structure of the sound. Pitch differences will depend mainly 
upon variations in fundamental frequency and quality differences on variations in 
formant structure. 

The basic method of experimental study consists in presenting to a group of 
listeners speech sounds in which these physical dimensions can be varied independently 
and systematically, developing a method by which the listener’s stress judgments can 
be recorded and determining by statistical treatment the influence of the physical 
variations. The experiments reported in this paper are concerned with the first three 
dimensions only, that is, with variations in the duration, intensity and fundamental 
frequency of the speech stimulus. One set of such experiments has already been 
reported in some detail (Fry, 1955). 


SYNTHESIS OF THE TEST MATERIAL 


The essence of this method is that the properties of the speech signals may be 
closely controlled. This is generally not possible in the case of live speech and only 
partially so in recorded speech, so that the most satisfactory method is to synthesize 
the required speech sounds in some way that will afford the necessary control over 
all the variables of the speech. The pattern playback equipment at the Haskins 
Laboratories was used for the purpose (see Liberman, 1952). In this machine, speech- 
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like sounds are generated and controlled by means of a painted spectrogram, which 
can be made to resemble to any desired degree a spectrogram from live speech. As in 
the common type of speech spectrogram, the frequency composition of the sound (its 
formant structure) is related to the disposition of the pattern with respect to the 
vertical axis, the total intensity of the sound depends on both the area and the density 
of the traces, and the duration of any segment is associated with the extent of any 
configuration along the horizontal axis. The painted spectrogram forms the control 
system in the process of speech synthesis. The pattern playback equipment generates 
an extended range of harmonics of a single fundamental (120 c.p.s.) and does not 
afford the possibility of changing the fundamental frequency of the synthesized sounds. 
The apparatus was used for the first series of experiments concerned with the duration 
and intensity of the synthesized syllables ; in these, the fundamental frequency was 
kept constant at 120 c.p.s. The second series was concerned with the effect of varying 
the fundamental frequency, and for this purpose a modification of the Vocoder (the 
Voback) was used (Borst and Cooper, 1957). The same painted spectrograms in this 
case controlled the output of the channels of a Vocoder synthesizer unit, and additional 
tracks on the spectrogram controlled the switching on of the pseudo-larynx tone and 
the frequency of this tone (the larynx frequency). 


LISTENERS’ JUDGMENTS OF STRESS 


The next problem in these experiments was to formulate the questions to be asked 
of the listeners. In all projects of this nature, it is an advantage if the subjects used 
can be induced to supply an operational response to the speech stimulus in conditions 
that do not differ too widely from those of normal speech communication. In 
experimenting with variations at the phonemic level, it is possible to achieve this 
satisfactorily by asking the subject to write down or to speak back what he hears. 
No special training in phonetic techniques is needed to enable the subject to show that 
he takes one stimulus to be key and another, tea. Reaction to differences of stress 
is in another category in the sense that orthography does not mark stress variations 
and the subject has no ready-made code in which to record them. As a consequence, 
the untrained subject is less aware of stress than of phonemic distinctions and it is 
correspondingly difficult to evoke an operational response to stress differences. There 
is in English, however, an association between stress pattern and grammatical function 
in certain classes of word ; for most English speakers, the word /'sAbd3ikt/, with 
trochaic rhythm is a noun, and the word /sob'd3ekt/, with iambic rhythm, a verb. 
It has been found that listeners with no phonetic training, on hearing an isolated word 
of this type, can judge whether they hear the noun or the verb form and in this 
way can register whether they hear the stress on the first or second syllable The 
material used was confined to five pairs of words, all of this type: subject, object, 
digest, contract, permit. 
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THE ANALYTICAL BASIS OF THE TEST MATERIAL 


The next task was to synthesize material for listening tests in which variations in 
the chosen physical parameters could be made systematically. This involved a 
decision on three major points : the physical dimensions to be explored, the range 
of variation to be covered and the size of the steps within each range. The obvious 
basis for such decisions is to be found in analytical study of the type of material to 
be synthesized and spectrograms were made of utterances of the test words by a 
number of different speakers. An account of this work, together with some of the 
measurements obtained, is to be found in a previously published report (Fry, 1955), 
and it will be enough here to indicate the general method. The selected words, both 
nouns and verbs, were included in sentences and great care was taken to ensure a 
common context, as far as possible, for both the noun and the verb in each pair. 
Twelve speakers then recorded all the sentences and spectrograms were made from 
the recordings. 


The physical parameters selected for the first series of experiments were duration 
and intensity and the spectrograms were examined and measured in order to establish 
the modes and range of variation which were associated with the two word classes, 
noun and verb. Several well-marked features emerged as a result of this analysis. 
First, the differences between a noun and a verb were carried almost entirely by the 
“ vowel” stretches of the wave-motions (see Fry, 1955) and it was evident that in 
synthesizing test material the whole range of variation might justifiably be made in 
the “vowel” stretches. Second, the distribution of both durations and intensities 
showed a well-defined bi-modality ; that is to say the noun/verb opposition was 
reflected in the physical data and in fact there was very little overlapping of the values 
for the members of each pair of words. 


This effect was even more apparent when the ratio of one vowel to another was 
plotted rather than the absolute value for either duration or intensity. This agrees with 
the linguistic description of stress as a relation between syllables and is very much 
to be expected at the physical level since stress relations survive changes in the rate 
of utterance (involving changes in absolute durations) and also changes in the mean 
intensity level of the speech. Hence the synthesis of test material was carried out 
having regard to suitable ratios of duration and intensity and the range of variation 
was established in similar terms. 


The third feature of the analytical data was that the distribution of duration and 
intensity ratio showed certain differences. Fig. 1 shows that the measurements fall 
into two groups with a well-defined cross-over point from noun to verb values ; in 
the case of intensity, this fell approximately in the middle of the range for all the five 
pairs of words. That is to say, the range of intensity ratios covered by the twelve 
speakers was approximately the same in the noun as in the verb ; for subject, the 
tatio V1/V2 was 14 db. in the noun and —14 db. in the verb, with the cross-over 
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Duration of Vowel Two 
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Duration of Vowel One 


Fig. 1(a). Measured vowel durations for the word-pair subject. 


point at equal intensity for the two vowels. In the case of duration ratio, each pair 
of words had its own pattern of variation ; the range and the cross-over value were 
different for each of the five pairs of words. For contract, for example, the range 
of duration ratios was from 0-1 to 1-06 and the cross-over value 0-50, while for digest 
the range was from 0-53 to 2-87, and the cross-over at 1-25. 


In selecting values of duration and intensity for synthesizing the test words, the 
chief object was to cover as nearly as possible the total range of observed values and 
at the same time to make certain of exploring the part of the range close to the 
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Fig. 1. (b) Measured intensities for the word-pair subject. In the plotting of intensity, the over-all 
intensity level is brought to the same value for all speakers. 


cross-over value from noun to verb. On the basis of the analytical data, it was decided 
to adopt an intensity range of +10 db. for all the five pairs of words and to use 
a different range of duration ratios for each pair. 

The number of steps to be used in each range was partly determined by the length 
of the listening test that subjects could be asked to undergo and it was found that, 
from this point of view, five steps in each dimension was a suitable number. In both 
duration and intensity, the two extreme values were chosen to be near the ends of 
the observed range, the middle value was approximately at the cross-over value from 
noun to verb and the two intervening values were chosen with the object of exploring 
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the uncertainty range between noun and verb. In the case of subject, for example, 
the observed duration ratios (V1/V2) ranged from 0-15 to 1-28, with the cross-over at 
0-66, and the chosen experimental values were 0-25, 0-40, 0-60, 1-00 and 1-25. For 
all pairs of words, the intensity ratios (V1/V2) were —10, —5, 0, 5 and 10 db. 


THE COMBINATION OF PHYSICAL CUES 


It has been pointed out already that judgments of stress depend upon a complex 
of perceptual factors which are interdependent. It follows that the effects of the 
physical correlates of these perceptual factors are also likely to be inter-related. 
In any speech sequence presented to a listener, the duration, intensity, fundamental 
frequency and formant structure all act as cues which determine the listener’s stress 
judgments and there is no method of rendering any of these physical dimensions 
inoperative. The clearest example of this is to be found, perhaps, in the formant 
structure of the speech sounds. In the verb /sob'd3ekt/ the first syllable contains 
the vowel /o/ and the second /e/, and formant structure typical of these sounds is 
an important factor in determining the listener’s stress judgment. A modification of 
formant structure, in the direction of /a/ in the first vowel or in the direction of /i/ 
in the second, would at once bias the stress judgments towards the trochaic or noun 
form. In synthesizing these words, therefore, whatever the formant structure may be, 
it is bound to exert a biasing effect. Hence in experiments with synthesized speech, 
we may decide to vary any one of the four physical dimensions and to keep the other 
three constant, but the chosen values for the latter will none the less contribute to 
the listeners’ stress judgments. 


THE DURATION AND INTENSITY TEST 


In the first series of experiments, it was decided to maintain constant values for 
formant structure and for fundamental frequency and to vary duration and intensity. 
Fundamental frequency for all the voiced sounds was kept constant at 120 c.p.s The 
formant structure during the vowel stretches gave a vowel quality corresponding to 
the stressed vowel in every case ; that is, the first vowel in all versions of subject 
sounded like /a/ and the second vowel, like /e/, and similarly for all the other 
word-pairs. Hence the biasing effect of the formant structure would tend in the 
opposite direction in the first and second syllables of a word and would thus be 
partially cancelled out. Another consideration was that the test was first made with 
a large group of American listeners. In American speech, it happens quite commonly 
that there is little or no opposition of vowel quality in such noun and verb pairs and 
hence the biasing effect would be rather less considerable. It turned out, in practice, 
that there was no marked difference between the responses of the American subjects 
and those of a small group of English subjects, 
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The variations in duration and intensity ratio covered the required ranges in five 
steps, as has been already indicated. In order to economize in test material, the two 
sets of variations were combined together in one set of test items. For each of the 
five word-pairs, versions were synthesized which covered the five steps of duration 
ratio and the five steps of intensity ratio, each value of duration being combined with 
each value of intensity. This gave a listening test of 125 items, which appeared to be 
about the longest test that listeners could comfortably manage on one testing occasion. 
All versions of the test word-pairs were recorded and assembled in random order. 
Each test item was inserted in a carrier sentence (also synthesized) and was heard in 
the context “ Where is the accent in ?” Listeners were asked to make a 
response to every item and to register this on a test sheet where the appropriate 
word-pair was printed for each test-item in this form : SUBject : subJECT, CONtract 
: conTRACT and so on. They were asked to underline the form that they heard. 





RESULTS OF THE DURATION AND INTENSITY TEST 


This test was carried out by 118 subjects ; the effect of variation in the physical 
cues was measured in terms of the proportion of these listeners who judged a given 
stimulus to be a noun or verb, that is to have trochaic or iambic rhythm. Since all 
subjects made a judgment about every test item, the number of noun judgments for 
one item is equal to 118 — (the number of verb judgments). For simplicity, therefore, 
all results of the test are given as the number of noun judgments, usually presented 
as a percentage of the total number of subjects. 

In the case of all five word-pairs, the total range of stimuli was enough to cause a 
complete swing of the listener’s judgments from noun to verb ; one version in each 
set produced a noun judgment from 97-100% of the listeners, and at the other end 
of the range, one version produced less than 10% of noun judgments, with the 
exception of permit in which the lowest value was 13%. The change in judgments 
followed the expected trend: where V1 was long in proportion to V2, there was a 
majority of noun judgments, and similarly where V1 was more intense than V2. The 
effect was reinforced in versions where V1 was both longer and more intense than V2. 
The disagreement amongst subjects was greater, that is the percentage of noun judg- 
ments was nearest to 50%, when the duration and intensity cues were opposed to each 
other, as for example in versions where V1 was longer but of lower intensity than V2. 


THE RELATIVE STRENGTH OF THE DURATION AND THE INTENSITY CUE 


There is no doubt from the experimental results that in the English word-pairs used 
in the test, both duration and intensity ratio have a marked influence in determining 
Stress judgments. An interesting question that one might try to answer on the basis 
of these results concerns the relative strength of the two cues. Information on this 
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point can be abstracted from the results by summing the noun judgments for all 
intensity ratios at each duration ratio, i.e. by taking the mean of the column values in 
the matrix of results. This gives the effect of changing duration ratio, and similarly 
summing for each intensity ratio, i.e. taking the row averages, gives the effect of 
changing intensity. The total taken for all five word-pairs showed that the total change 
in noun judgments due to duration was from 12% to 92%, and that due to intensity 
ratio was from 40% to 82%. 

In order to establish the significance of this relation, we need to make a quantitative 
comparison of the duration and intensity ratios used in the experiment. Since the 
range of values was approximately equal to those found in the analytical data, that 
is in natural speech, the range of duration change can be regarded as at least in this 
sense equivalent to the range of intensity change. In Fig. 2, the aggregate of noun 
judgments for each duration and intensity ratio is plotted. This is a formal representa- 
tion of the results in which the abscissae are simply succeeding steps of duration or 
intensity change and not points on a quantitative scale. It is evident from the 
experimental results that an extension of the duration range would not lead to any 
major change in noun judgments since these already cover nearly the whole range 
0-100%. Whether extension of the intensity range would give judgment values 
near to 0 or to 100% could be determined by experiment, but it was in fact clear from 
the preliminary syntheses that preceded the final test that extreme steps of intensity 
change from V1 to V2 served only to make the stimulus sound very unnatural without 
increasing the impression of strong stress. Such an experiment would, further, leave 
unresolved the question of equivalence between duration range and intensity range and 
it seemed therefore worth while to seek an alternative method of treating the existing 
results in order to reach some conclusion concerning the relative strength of the 
duration and intensity cues. 

As we have already said, the response to any stimulus in the test is made up of 
four factors: the response due to duration, that due to intensity, to fundamental 
frequency and to formant structure. The force of any of these factors could be more 
reliably abstracted from the data if the degree of agreement amongst subjects were 
expressed on a scale which was not artificially compressed by the barriers of 0 and 
100%. Such a measure is provided by taking the logit number for each test item 
instead of the percentage of noun judgments. The subjects were able to make one 
of two responses to each item. If p = proportion of noun judgments, and gq = (1 — p) 
= proportion of verb judgments, then logit p = loge p/q. The range of logit values 
will be + 00, the smallest degree of agreement (50%) will have the logit 0 ; positive 
values of logit p will indicate agreement in a noun judgment and negative values 
agreement in a verb judgment. The logit response for each test item will represent 
a factor due to duration and a factor due to intensity and these factors can then be 
abstracted as before by taking the row and column averages of the matrix of results. 
An inspection of the crude data made it clear that they would not yield an exact fit 
with this type of treatment since there were several irregularities in the pattern. A 
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Fig. 2. Percentage of listeners’ “noun” judgments for all test words as a function of 
(a) vowel duration ratio and (b) vowel intensity ratio. 


difficulty arises with values of 0% and 100%, which would theoretically give logits of 
—oo and +00 ; it seemed good enough for our purposes to consider them crudely 
as 4% (logit = —5-293) and 994% (logit = 5-293) since the irregularities in the 
pattern make it impossible to use the most refined statistical methods. 


The procedure was to calculate the logit values for all percentages occurring in the 
results and to tabulate these for each of the word-pairs used in the test. The common 
logit for each duration ratio was obtained by taking the column averages and for each 
intensity ratio by taking the row averages. The supposition is that the logit for any 
combination of duration and intensity can be expressed as a sum of a duration effect 
and an intensity effect: one may reasonably expect this to be approximately true 
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Fig. 3(a). Common logit values for duration ratio from the results for the word-pair subject. 


although, as we have said, the irregularities in the distribution make it impossible to 
test the hypothesis with any exactness. We can only measure relative effects : since 
both the column and the row averages may be considered to contain the general 
average, we have subtracted this general average from all column (i.e. duration) 
averages to avoid counting it twice over. The logits for each word-pair were plotted 
separately as a function of duration and intensity ratio and a typical result (for the 
word-pair subject) is given in Fig. 3. 
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Fig. 3. (b) Common logit values for intensity ratio from the results for the word-pair subject 


It will be seen that the logits both for duration and for intensity lie approximately 
on a straight line. We may conclude from this that succeeding steps of duration change 
produce equal changes in the logit and the same is true for intensity changes This 
means that the ratio p/q, i.e. noun/verb, is multiplied by nearly the same factor for 
equal changes in duration and intensity and thus rises in a geometrical progression. 
Since this is so, we may now compare the effect of duration and intensity by comparing 
the slope of the two lines. In the case of subject, the whole range of intensity change 
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of 20 db. produces a logit rise of 2-5. On the duration line, a change in the logit of 
2-5 is effected by a change in duration ratio of approximately -6. Similar calculations 
in the case of the other word pairs give the following results: object, 20 db. is 
approximately equivalent to a duration ratio change of -4, digest, -16, contract, -35 and 
permit 9. This method of treating the data therefore affords a means of making a 
quantitative comparison of the duration and intensity cues and their influence on stress 


judgments. 


STRESS JUDGMENTS AND VARIATION IN FUNDAMENTAL FREQUENCY 


A second series of experiments was undertaken to explore the role of fundamental 
frequency variations in determining stress judgments. It is clear that such variations 
will affect the intonation pattern perceived by the listener, and in English speech, the 
perception of a rhythmic pattern is very closely bound up with the 
perception of intonation. In the case of the word-pairs used in the previous 
experiment, in many contexts the sentence intonation pattern would have an over- 
riding influence on the decision as to whether the noun or verb form had occurred. 
This factor complicates the problem of examining the part played by fundamental 
frequency variation in stress judgments since the resulting pitch changes will not 
only contribute to the perception of stress but will tend to impose upon the stimulus 
sequence a sentence intonation which may be decisive for the stress judgment. 

The purpose of this set of experiments was to study the effect of fundamental 
frequency variation in conditions where the influence of sentence intonation is reduced 
to a minimum. It is generally true that in English the functionally most important 
parts of a sentence intonation pattern are syllables in which the pitch changes in the 
course of the syllable. Syllables with level tone are the nearest one can get to units 
that are neutral with respect to sentence intonation, thougk it is obviously not pussible 
to make sentence intonation inoperative. These experiments were made, therefore, 
by synthesizing sequences in which fundamental frequency remained constant 
throughout a syllable and any change of frequency was made between syllables. 

The material was confined to the word-pair subject. Versions of this were 
synthesized in which a change of fundamental frequency was effected at the junction 
between the first and second syllable. It has been already said that this synthesis 
was carried out with the Voback (Borst and Cooper, 1957), a device in which hand- 
painted spectrograms are used to control the synthesizer action of an 18-channel 
Vocoder. The duration, intensity and formant structure of the synthesized sounds 
were controlled in a manner similar to that used in the previous experiments (though 
the sound produced by the Vocoder is of a different character) and additional sections 
of the painted pattern served to control the switching of the buzz and hiss generators 
and the frequency of the fundamental during the buzz sequences. This frequency was 
measured by means of a General Electric audio-frequency meter connected across the 
buzz generator. 
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As in the previous test, two physical dimensions were explored at the same time ; 
the duration ratios already used in subject were used in the new test and were combined 
with step-changes of frequency ranging from 5 c.p.s. to 90 c.p.s. The intensity ratio 
was constant at equal intensity for V1 and V2 and the formant structure was the same 
as in the first test. 


The choice of fundamental frequencies for this test involved a number of considera- 
tions that should be briefly mentioned. The listeners were to hear a series of sense- 
groups, each containing two syllables, and to make a judgment about the stress pattern. 
The effect of sentence intonation was to be minimized, but apart from this it was 
desirable that the stimuli should be as natural as possible since this was likely to 
make the judgments more consistent. In English speech there is a strong tendency 
for a sense group to be spoken in one key and for musical modulation to take place 
between groups. This effect of key depends largely upon the occurrence in the group 
of some reference pitch, of which the speaker is unaware, but which regulates the 
pitch of all the syllables in the group. In the test items it was therefore decided to 
adopt a reference frequency which would occur in every item, and in order to limit the 
number of variables in the test, the same reference was used throughout the test. 
The synthesized speech was intended to sound like that of a male speaker, and the 
selected reference frequency of 97 c.p.s. gave this effect successfully. 


The range of variation in fundamental frequency was decided on similar grounds. 
In the intonation patterns heard from most English speakers changes in pitch of more 
than one octave are infrequent and are not often met with in successive syllables, even 
from the most excitable speakers. Preliminary syntheses showed in fact that a change 
of 90 c.p.s. on 97 c.p.s. (approximately a semi-tone less than one octave) produced 
stimuli that sounded rather unnatural and hence this upper limit was adopted as 
being likely to show up the maximum effect of frequency change without introducing 
very unnatural stimuli which would perhaps make listeners respond in a random 
manner. 


The relation between the reference frequency and that of the other syllable was 
found to be important for the naturalness of the stimulus. Each syllable was on one 
tone, that is of constant frequency, and if the relation between the syllables was such 
as to make the impression of an exact musical interval, the test word appeared to be 
sung and listeners found it difficult to make a stress judgment. Care was taken 
therefore to avoid this effect as far as possible and this was one reason for the fixing 
of the reference frequency at 97 c.p.s. In preliminary experiments a reference 
frequency of 100 c.p.s. with frequency intervals of multiples of 5 c.p.s. was used. 
Many of the stimuli then had much too musical an effect which was eliminated by 
the change of the reference frequency to 97 c.p.s. Frequency changes as small as 
3 c.p.s. were used in the first experiments but listeners’ responses to these items were 
very inconsistent and were disregarded in the final test. The frequency steps 
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ultimately selected were designed to explore adequately the range of variation up to 
90 c.p.s. and the experimental values were: 5, 10, 15, 20, 30, 40, 60 and 90 c.p.s. 

It was expected that an important factor in determining stress judgments would be 
the direction of frequency change in the course of the stimulus word and it was 
necessary therefore to make the step-change of frequency in both directions, that is 
in one case with the first vowel on a higher frequency than the second, in the other 
case with the first vowel lower. In all cases the lower vowel was at the reference 
frequency of 97 c.p.s. The total of frequency changes was therefore 16, each of the 
8 intervals used in two directions. The 5 duration ratios were combined with each of 
the frequency changes, giving a total of 80 test items. In this test the items were 
not inserted in a carrier sentence since this would tend to increase rather than to 
minimize the influence of sentence intonation on the results. Subjects were asked 
to register their responses in the same way as in the previous test. 


RESULTS OF THE FUNDAMENTAL FREQUENCY TEST 


The effect of pitch on the perception of stress is generally held to be that a 
higher pitch produces an impression of greater stress. This experiment was designed 
to test first the hypothesis that, if two syllables differ in fundamental frequency, the 
syllable having the higher frequency is more likely to be judged as stressed. It was 
intended also to determine whether this principle, if it operates at all, is subject to 
modification through the effect of duration ratio, which was shown by the first 
experiment to be an important factor in stress judgments. Last, the experiment was 
intended to show whether the size of a frequency step between syllables has a marked 
effect on stress judgments. 


If a syllable of higher fundamental frequency tends to be judged stressed then in 
this test the step-down change of fundamental would lead listeners to perceive the 
stress on the first syllable of the test word, that is, it would tend to increase the 
number of noun judgments, and the step-up change would decrease the number of 
noun judgments. In all, 41 subjects carried out this test ; they included a group of 
American and a group of English speakers. Fig. 4 gives the results of the test as 
percentage noun judgments for each duration ratio, with the step-down and step-up 
changes plotted separately. 


The effect of changing duration ratio re-appears clearly in the results of this test 
and the shape of the curves is similar to those obtained in the first experiment, but 
there is good evidence of the effect of fundamental frequency change suggested by 
the hypothesis The step-up change of frequency moves the whole curve in the 
direction of fewer noun judgments and the step-down change displaces it in the 
direction of more noun judgments. The difference between means for step-up and 
step-down change is significant at the 1% level for all duration ratios. 
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Fig. 4. The effect of step changes of fundamental frequency on “noun” judgments for the 
word-pair subject. 


EFFECT OF THE SIZE OF THE STEP-CHANGE IN FREQUENCY 


In the case of both duration and intensity ratio it has been shown that progressive 
increase in these quantities is reflected in increasing noun judgments by the subjects. 
The next question to be asked with regard to fundamental frequency is whether 
increase in the frequency ratio of V1 to V2 would have a similar effect, or whether 
fundamental frequency change, unlike duration and intensity change, tends to produce 
an all-or-none effect 








cas Experiments in the Perception of Stress 


The effects of frequency change were abstracted from the data by combining all 
duration ratios for each step change of frequency. In order to detect any possible 
trend in the results, the logit response for each frequency was calculated and the 
values are shown in Fig. 5. The first important feature of these results is the 
discontinuity in logit response between the values —5 and 5 c.p.s., that is at the 
cross-over from a step-up to a step-down change in fundamental frequency. This 
confirms the conclusion already reached by inspecting the results for duration ratio 
in this experiment. Increase in the size of the frequency step appears to produce no 
marked trend in the results, however. The logit values for the step changes lie 
approximately on a horizontal line, indicating that the size of the change is having 
no appreciable effect. For the step-up change, if there is any trend, it is in the 
direction opposite to the expected one. An increase in the size of the step-up gives 
a slight increase in noun judgments, rather than the expected decrease. This effect 
is contributed largely by the 90 c.p.s. change and it may well be that this large step-up 
appeared even more unnatural to the listeners than an equal step-down and thus 
caused greater uncertainty in the judgments. 


These results provide good evidence for supposing that a step-change of fundamental 
frequency affects stress judgments in a specific way. It appears likely that so long as 
the resulting pitch change is easily perceptible to the listener, he tends to judge a 
higher syllable as more stressed, but the magnitude of the pitch change makes little 
contribution to his judgment. This would be consistent with the fact that a frequency 
change of 3 c.p.s. led to a dispersion of the listeners’ judgments ; it may well have 
been too small to cause the all-or-none effect in the perception of stress. 


SENTENCE INTONATION AND STRESS JUDGMENTS 


The role of intonation in determining stress judgments has already been touched 
upon in connection with the previous experiment in which efforts were made to reduce 
the influence of sentence intonation. It is clear, however, that any account of the 
factors affecting stress judgments is incomplete without an attempt to answer certain 
questions about sentence intonation. The most important of these is the question 
whether, as one would expect, sentence intonation is so strong as to be capable of 
outweighing all other factors in stress judgments. 


A third set of experiments was carried out to answer this question. As in the 
previous experiments, these were designed to explore a range of variation in physical 
cues and to determine the effect of this variation on stress judgments in the same way 
as before. The important variable was again fundamental frequency, but this time 
the variations were chosen to allow sentence intonation the maximum effect. 


It was said earlier that, broadly speaking, a syllable containing a pitch change is 
functionally more important in English intonation than a level syllable, and for this 
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Fig. 5. Logit response for step change of fundamental frequency. Frequencies are plotted 
on a logarithmic scale. 


third test versions of the word-pair subject were synthesized in which fundamental 
frequency changed in the course of one vowel stretch. It should be made clear, 
however, that the purpose was not to reproduce faithfully certain English intonations, 
but rather to cover a wide range of patterns of fundamental frequency variation and 
to study the effect of these. 
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Fig. 6. Types of fundamental frequency change used in the syllable inflection test. 


THE SYLLABLE INFLECTION TEST 


Again the intensity ratio of the two vowels in each version was kept constant at 
equal intensity and the same formant structure was used as before. The five duration 
ratios were combined with the fundamental frequency variations. In order to reduce 
the number of variables, the frequency range over which the fundamental varied 
within one vowel was kept constant throughout the test. A reference frequency of 
97 c.p.s. was again used, that is at some time during the stimulus word the fundamental 
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reached this minimum value. The highest frequency used was 130 c.p.s. and when 
frequency changed in the course of a syllable it covered the whole of this range from 
97 to 130 c.p.s. A number of stimulus words included one level syllable and the 
fundamental frequency for such syllables was either 97 or 130 c.p.s. 


Two types of frequency change within the syllable were used. In the first type, the 
frequency changed continuously throughout the vowel, and in the second, the 
frequency change occupied only half the vowel duration. Fig. 6 shows the graph 
of frequency change with time for the types of syllable used in the test. It will be 
seen that the rate of change of frequency was allowed to vary with the duration of the 
vowel. Stimulus words were synthesized which covered a range of 16 patterns, each 
combined with 5 duration ratios. The different patterns are listed in Table 1 where 
the frequency variation for each word is shown symbolically and the letters serve to 
identify the patterns in discussing the results. 


RESULTS OF THE SYLLABLE INFLECTION TEST 


Responses to this test were obtained from 76 subjects, including both American 
and English speakers. The first important consideration in examining the results is 
that the frequency variations cannot in this case be placed on a quantitative scale ; 
the test was designed to show up an all-or-none effect and it is for this that we have 
to look in the data from the test. It is to be expected, and the data indeed show once 
more, that increasing duration ratio will have the effect of increasing the number of 
noun judgments, but the first question is whether any patterns of frequency variation 
over-ride the duration cue. In the absence of a fundamental frequency cue, for 
example when the five duration ratios are combined with equal intensity in the two 
vowels, on a monotone, then the smallest duration ratio produces a majority of verb 
judgments, and the largest ratio gives a majority of nouns. A simple criterion might 
be applied first of all to the data from the syllable-inflection test and we might look 
for any frequency pattern for which the number of noun judgments either never falls 
below or never reaches 50%, that is for cases in which the whole curve is transposed 
above or below the 50% level. Such cases-are to be found in the results and Fig. 7 
gives the curves for two such patterns, A and B. For pattern A, even with the 
smallest duration ratio, there is a majority of noun judgments and for pattern B, the 
greatest duration ratio still produces a substantial majority of verb judgments. These 
two frequency patterns will, obviously, sound to the listener like two common English 
intonation patterns in which the fall normally occurs in the stressed syllable and it is 
not surprising that they should influence stress judgments so strongly. A similar 
effect is to be found for patterns J and M, which are functionally similar to A and B. 
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TABLE 1. 
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Fig. 7. The effect on “noun” judgments of two patterns of fundamental frequency change 
(see Table 1). 


The range and the mean of noun judgments for all patterns are given in Table 1, and 
it will be seen that the range for J is 49-95%, that for M is 3-49%. The influence 
of fundamental frequency change is not, however, confined to patterns giving rise 
to a familiar intonation. Both E and F produced an un-English intonation but none- 
theless evoked a large majority of noun judgments because of the inflection in the 
first syllable. 
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THE EFFECT OF DIFFERENT TYPES OF FREQUENCY PATTERN 


A wide variety of patterns was used in this experiment in the hope of answering 
certain questions concerning the effectiveness of different types of fundamental 
frequency variation in determining stress judgments. The stimulus words contained 
three kinds of syllable : level syllables, syllables with a linear change of frequency 
and syllables with a curvilinear change. These syllables occurred sometimes as the 
first and sometimes as the second syllable of a stimulus word and it was possible by 
grouping the results to obtain some information on the relative power of these 
syllabic patterns to influence stress judgments. If we compare patterns A and B, for 
example, a noun judgment for A means that the subject heard a linear change of 
frequency as stressed in contrast to a level syllable. In B, a verb judgment means 
the same thing. But a verb judgment for A or a noun judgment for B means that 
the subject heard a level syllable as stressed in contrast to a linear change. Provided 
that the five duration ratios are equally represented in the samples, we can group 
sets of data together in this way and obtain some indication of the association between 
types of syllable and the judgment that the syllable is stressed. The first contrast 
treated in this way was that between inflected and level syllables. In all patterns that 
contained both a level and an inflected syllable, 66% of all inflected syllables were 
judged stressed and 33% of level syllables. This difference was highly significant 
at the 1% level. 


The two types of inflected syllable were compared in a similar way. For example, 
patterns A and J contain an inflected first syllable, in the one case a linear and in 
the other a curvilinear inflection. By comparing the number of noun judgments in 
this and in similar cases we gain a measure of the relative effectiveness of the two 
types of syllable. Of all syllables with linear frequency change, 62% were judged 
stressed, whilst 72% of the syllables with curvilinear change were heard as stressed. 
This difference is not significant. 


The last comparison made in this way was between rising and falling inflections. 
The intonation patterns of English involve both rising and falling tones and the 
word-pairs used in these experiments could certainly occur in contexts where noun 
and verb might both be required by the sentence intonation pattern to bear a rising or 
a falling tone. It would appear, therefore, that this stress judgment should be 
independent of the difference between rising and falling changes in fundamental 
frequency. The result obtained by grouping the data was that 61% of rising syllables 
were judged stressed and 64%, of falling syllables, a difference that was not significant. 


A final comment is necessary on this experiment with frequency patterns. The 
variations in frequency indicated in Table 1 should not be simply equated with 
English intonation patterns. Whilst it is true that many items appeared to have a 
fairly natural intonation, it cannot be assumed that this intonation was necessarily 
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the one suggested by the frequency pattern. A preliminary attempt has been made 
to correlate the intonation pattern with the frequency pattern by asking several trained 
listeners to note the intonation they heard in each item. It is clear from these 
judgments that a number of the vowels are so short that a change of fundamental 
frequency is not perceived and the syllable is judged to have a level tone. Other 
effects of this sort may appear as a result of further investigation on these lines. 


CONCLUSIONS 


The experiments reported in this paper represent an attempt to explore three 
physical dimensions which appear to be important in determining stress judgments 
in English : duration, intensity and fundamental frequency. The importance of the 
duration ratio is confirmed by the fresh data presented here ; it seems that in English, 
in a considerable variety of conditions, changes of vowel duration ratio can swing 
listeners’ perception of strong stress from the first to the second syllable in the type 
of disyllable that has been considered. There seems no reason to doubt that this 
factor operates in stress judgments in other rhythmic contexts. Intensity ratio has a 
similar influence but it is somewhat less marked. The data show no case in which 
change of intensity ratio caused a complete shift of the stress judgment from first to 
second syllable. 

Change in fundamental frequency differs from change of duration and intensity in 
that it tends to produce an all-or-none effect, that is to say the magnitude of the 
frequency change seems to be relatively unimportant while the fact that a frequency 
change has taken place is all-important. The experiments with a step-change of 
frequency show that a higher syllable is more likely to be perceived as stressed ; 
the experiments with more complex patterns of fundamental frequency change suggest 
that sentence intonation is an over-riding factor in determining the perception of 
stress and that in this sense the fundamental frequency cue may outweigh the 
duration cue. 

In conclusion, it may be necessary to reiterate that all judgments of stress in 
natural speech depend on the complicated inter-action of a number of cues. 
Experiments such as those described above require a drastic simplification of the 
conditions in which the judgment is made and even so there are still a number of 
factors which cannot be controlled until further work has been done in this field. The 
formant structure cue still remains to be investigated and it is quite probable that 
for English listeners, at least, the changes in vowel quality introduced by variations 
in formant structure may prove one of the most powerful factors in determining stress. 

The author wishes to thank Dr. F. S. Cooper and the staff of the Haskins 
Laboratories for their help in carrying out some of these experiments and 
Dr C. A. B. Smith for suggesting methods of treating the data. 





152 Experiments in the Perception of Stress 


REFERENCES 


B.ocu, B. and Tracer, G. L. (1942). Outline of Linguistic Analysis (Baltimore). 
BLOOMFIELD, L. (1933). Language (New York). 


Borst, J. M. and Cooper, F. S. (1957). Speech research devices based on a channel Vocoder. 


F. acoust, Soc. Amer., 29, 777. 


Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. 7. acoust. 


Soc. Amer., 27, 765. 
Jongs, D. (1949). An Outline of English Phonetics (Leipzig). 





A GLOSSARY 


By WILLIAM S. VERPLANCK 


Familiarizes readers with developments in the study of animal behavior 


Clarifies concepts used by behaviorists and ethologists 


Price $1.00 
Order from: 


American Psychological Association 
1333 Sixteenth Street, N.W. 
Washington 6, D. C. 

U.S.A. 





OF SOME TERMS USED IN THE 
OBJECTIVE SCIENCE OF BEHAVIOR 


Provides an empirical vocabulary in the science of human and animal behavior 











Clare o’ Molesey Ltd. (T.U.), 79 Bridge Road, East Molesey, Surrey. 





a e 








153 


SOME CUES FOR THE DISTINCTION BETWEEN VOICED 
AND VOICELESS STOPS IN INITIAL POSITION* 


A. M. Lrperman,** P. C. DELATTRE,*** AND F. S. Cooper 
Haskins Laboratories, New York 


Experiments with synthetic speech produced by the Pattern Playback indicated 
that the voiced stops in initial position could be made to sound like their voiceless 
counterparts by cutting back the beginning of the first-formant transition. Normally, 
the cutback of the first formant raises its starting frequency and also delays the time 
at which it begins relative to the other two formants, Both these changes appeared, 
in general, to be necessary. With certain combinations of stop and back vowel, 
however, a delay in the onset of the first formant was by itself sufficient to produce 
a strong voiceless effect. Substituting noise for harmonics in the transitions of the 
second and third formants (for the duration of the first-formant cutback) increased 
the impression of voicelessness over that obtained with cutback alone. 

In judging stimuli that lay near phoneme boundaries, many listeners demonstrated 
what appeared to be a very high degree of acuity. It is possible that this is the 
result of long experience in the use of the language and thus represents an effect of 
learning on perception. 


Phonetic observations and the results of spectrographic analysis point to several 
acoustic features that may underlie the discrimination of voiced and voiceless 
stops in initial position. Of these, the presence or absence of vocal cord vibration 
ought, perhaps, to be considered first, since it is the nominal basis for the designation 
of the two classes as “voiced” and “ voiceless.” On spectrograms this feature 
should be visible, and often is, as a line in the low-frequency region, appearing 
relatively earlier for the voiced than for the voiceless stops. Although this “ voice 
bar” may be adequate as a phonetic basis for distinguishing the two classes of 
sounds there is no reason to believe that it is necessarily of overriding importance 
in perception.’ 

In the case of English, phoneticians agree about a second characteristic, aspiration, 
which is presumed to be present in initial stops uttered in the voiceless manner 
and absent in their voiced counterparts. It is difficult to guess what all the acoustic 
correlates of aspiration are likely to be, but it is reasonable to suppose that one 
of its acoustic manifestations will be the presence of noise rather than harmonics 


* This work was supported in part by the Carnegie Corforation of New York and in part 
by the Department of Defense in connection with Contract No, DA49-170-sc-2564. A 
portion of this paper was presented at a meeting of the Acoustical Society of America in 
Ann Arbor, Michigan on October 24, 1957. 


** Also at the University of Connecticut. 
*** Also at the University of Colorado. 


1 The distinction between voiced and voiceless stops (in initial position) is quite perceptible 
in whispered speech. In itself, this must mean that the presence or absence of vocal cord 
vibration cannot be the only cue. 
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in the beginning parts of the formant transitions.’ Observation of spectrograms 
indicates that this might be the case, but the details are often unclear. 

Experimental work with synthetic speech suggests other possible cues. Thus 
there have been persistent indications that an important aspect of voicing is some- 
where to be found in the first formant. In this connection, we have known for 
some time that we could produce a strong voiced stop only by starting the first 
formant very low on the frequency scale (Delattre, Liberman and Cooper, 1955). 
Starting the first formant anywhere else weakens the voicing impression, though it 
cannot be said to produce a strong impression of voicelessness. In the case of 
released stops in final position, Malécot has found some indication that he could 
convert voiced stops into voiceless ones by reducing the intensity of the first 
formant of the release, or by omitting it entirely." Exploratory work by the authors 
of this paper indicated that Malécot’s mancuvre seemed to produce essentially the 
same effect when applied to stops in initial position ; this is not at all surprising, 
since the release of the final stop is, in effect, the beginning part of a new syllable, 
albeit a short one. 

The experiments to be described in this paper were intended primarily to 
investigate further this last-mentioned feature—the elimination of a portion of the 
first-formant transition—as a cue for the distinction between voiced and voiceless 
stops in initial position.*** Investigations of the effects of the other two features 
described above, viz., vocal cord vibration and noise in the transitions, have’ also 
been undertaken, and their results will be presented. 


* Dr. Fischer-forgensen (1954) has presented a lengthy discussion of this and related matters. 
3 Personal communication from Dr. André Malécot. 


* Our interest in this cue was heightened as a result of a conversation with Dr. Gunnar Fant 
in which he informed us that he had observed this effect in spectrograms, and suggested 
that there were reasons for supposing that it would occur. Since this article was written 
Dr. Fant has published these and related observations. See especially: G. Fant, Acoustic 
Theory of Speech Production. Royal Institute of Technology (Division of Telegraphy- 
Telephony) Report No. 10, 1958, Stockholm; and G. Fant, Modern instruments and 
methods for acoustic studies of speech. Acta Polytechnica ‘Scandinavia, Ph. Series No. 1, 1958. 


5 It is not appropriate in this paper to attempt a detailed explanation of the first-formant 
intensity change in terms of articulatory-acoustic considerations, and, indeed, we are not 
at this point fully prepared to do so, In general, however, such an explanation is possible 
on the basis of several factors which singly or in combination might produce the effect. 
One consideration, suggested to us by Dr. K. N. Stevens, is that the noise source of affrication 
may be weak in the low frequencies. A second suggestion, for which we are indebted to 
Dr. G. Fant, is that in English speech the vocal cords are open during the articulation of 
the aspirated portion of the voiceless stop, and that, in consequence, the lower resonance 
may be effectively “lost” in a large back cavity that extends beyond the open cords into 
the bronchial tubes Also, there is at least a possibility that the noise source is not in 
such a position as to excite the entire tract. 
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Fig. 1. Hand-painted spectrographic patterns used to produce the most extremely voiced 
stops (plus vowels). 


FIRST-FORMANT CUTBACK 


In this experiment we studied the effects of progressively eliminating the transition 
of the first formant. For that purpose we first prepared hand-painted spectro- 
graphic patterns like those shown in Fig. 1. These patterns were made in accordance 
with the results of several experiments (Liberman, Delattre, and Cooper, 1952 ; 


5 Mlle. Durand (1956) has found that increasing the rate of first formant-transition contributes 
some voicelessness to the stop consonants. We have not ourselves investigated this cue 
systematically, nor have we carefully compared its effects with those produced by reducing 
the intensity of the first-formant transition. It is our impression, however, that in the case 
of English the latter change has by far the larger and more realistic effect. 
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Fig. 2. Spectrographic patterns illustrating the way in which the voice bar was removed 
and the first formant was cut back to produce one stimulus series. The 
patterns shown are appropriate for /b,p/ with /z/. The numbers above the patterns 
show the amount of first-formant cutback in msec. 


Delattre, Liberman, and Cooper, 1955 ; Harris, Hoffman, Liberman, Delattre, and 
Cooper, 1958) on the acoustic cues for the stop consonants and were intended to 
synthesize stop consonant-plus-vowel syllables, the bursts, transitions, and steady states 
being appropriate for /b/, /d/, and /g/ before /e/, /z/, and /o/. Since the 
intensity of the burst is itself a cue for the voiced-voiceless distinction, great care 
had to be exercised to make this feature essentially neutral. This was accomplished, 
separately for each pattern, on a trial-and-error basis. 

The patterns of Fig. 1 have the fully rising first formant which is presumed to 
produce the strongest voiced effect. Each of these patterns was then altered in the 
manner illustrated (for /b,p/ with /z/) in Fig. 2. The first step was to remove 
the voice bar and so produce the pattern labelled “O”. Successive cutbacks of the 
first formant in steps of 10 msec. produced the stimuli labelled “10” to “50”. 
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It should be emphasized here that the duration of cutback in the first formant 
corresponds to the period of aspiration, during which the vocal cords are open. Any 
energy in the second and third formants during this period must, therefore, consist 
of noise rather than harmonics. In the first set of experiments, the second and 
third formants were nevertheless filled with harmonics. The effects of replacing the 
harmonics with noise will be dealt with in the third part of this paper. 


Cutbacks of the first formant analogous to those described above were made on 
the other synthetic stop-plus-vowel patterns. In this way nine series (three stops 
times three vowels), each consisting of seven stimulus patterns (the six cutback 
conditions plus the zero cutback with the voice bar) were produced. These patterns 
were converted into sound by the playback, recorded on magnetic tape, and spliced 
into a random order. The resulting tape was played for 28 listeners who were 
instructed to judge each sound as /b/, /d/, /g/, /p/, /t/, or /k/ and to guess 
if necessary. Of the 28 listeners, 20 were phonetically naive students at the 
University of Connecticut, five were workers at the Haskins Laboratories who had no 
special linguistic training, and three were trained linguists. 

Before considering the major results of the experiment, we should note that 
in making their responses the listeners had to choose from among six alternatives. 
In effect, they had not only to determine whether the sound was voiced or voiceless, 
but also to identify its place of production (i.e., to decide whether it was a labial, 
alveolar, or velar stop). Out of a total of 1,764 judgments made, only 15 
represented “place” errors—that is, less than one per cent of the sounds were 
identified by the listeners as having a place of production other than that intended 
by the experimenters. This aspect of the data has, therefore, been ignored and in 
Fig. 3 the judgments for each “ place” category have been plotted separately in 
such a ‘way as to show how the identification as voiced or voiceless varied as a function 
of the changes in the stimulus variable. 

It is seen from Fig. 3 that the first-formant cutback effectively converted voiced 
stops into voiceless. The alveolar stop /d,t/ required the greatest amount of cutback 
to be heard as voiceless, and the velars /g,k/ only slightly less, while the labials 
/b,p/ changed to voiceless with relatively small amounts of the first formant removed. 
Indeed, in the case of the labials many listeners judged them as voiceless when the 
voice bar had been taken away and before any part of the first-formant transition 
had been eliminated. This is the only case in which the effect of the voice bar can 
be seen, since all the alveolar and velar stops were heard as voiced by all the 
listeners whether the voice bar was in or out. 

The effect of the first-formant cutback appears to be largely independent of the vowel 
for /b,p/ and /g,k/ ; in the case of /d,t/, however, there is a tendency for the shift 
to the voiceless member to occur at progressively larger cutbacks through the series 
Je, 2, of. 

Although each stimulus was judged only once by each listener, it is nevertheless 
possible to assess the consistency of the individual subject’s responses. For this 
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Fig. 3. Responses of 28 phonetically naive listeners as a function of amount of cutback 
of the first formant. The data are shown as the percentage of listeners who identified 
each pattern as a voiced stop. For the stimulus condition in which the voice bar preceded 
the patterns, the responses are shown on the separate ordinate at the left of each graph. 
The graphs are arranged in groups according to the vowels /e, 2, 9/ with which the 
consonant transitions were paired. 


purpose we tabulated the responses of each subject separately against the various 
values of the stimulus variable. Since the stimuli were presented in random order, 
we can take it as an indication that the subject was self-consistent if it is possible to 
locate a dividing line on the stimulus series such that all responses on one side 
of the line are voiced and on the other side unvoiced—that is, if the subject has 
sorted perfectly across one step of the stimulus scale. The results of this kind of 
analysis are presented in Table I, where we see, for each stop paired with each 
vowel, the percentage of listeners who sorted the stimuli perfectly. (For the 
purposes of this analysis the voice-bar condition has been treated as if it were 
the first step on the stimulus continuum.) 

It can be seen that individual self-consistency is highest for /d,t/, slightly lower 
for /g,k/, and, relatively, quite a bit lower for /b,p/. Although it is difficult 
to define standards for a situation like this, it would appear in general that the self- 
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consistency is remarkably high. In the case of /d,t/ with /a/, all subjects sorted 
perfectly. 

In addition to questions about self-consistency or variability of the individual, 
we can ask about variability between individuals. It is clear from the plots of 
group data in Fig. 3 that the inter-subject agreement must be quite high for /d,t/ 
and /g,k/, especially with /2/. Indeed, for /d,t/ with /a/, one can see from 
the group data that 26 of the 28 listeners changed their judgments from /d/ to /t/ 
at the same point on the abscissa (30 msec.) ; the remaining two listeners changed 
at a point one step removed (20 msec.). In the case of /b,p/, there is, as we have 
seen, considerably more intra-subject variability, and it becomes more difficult to 
compare individuals. It is possible, however, to find enough listeners who were 
sufficiently consistent with their own judgments to permit conclusions concerning 
differences between individuals (as distinct from random variability). By examining 
the responses of these listeners, we find that individual differences are indeed 
relatively greater for /b,p/ than for /d,t/ or /g,k/. 
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To provide a further test of these conclusions about intra- and inter-subject 
variability, we arranged several additional random orders of the stimuli and presented 
each order a number of times to each of 11 phonetically naive subjects. In this 
way we built up a sufficient number of responses to enable us to plot, separately 
for each subject, functions similar to those of Fig. 3. These plots, which are not 
shown here, fully support the conclusions reached on the basis of the group data. 
Both intra- and inter-individual differences are very small for /d,t/ and /g,k/ 
and relatively larger for /b,p/. 

These facts about intra- and inter-individual differences may mean that the 
first-formant cutback is less important for /b,p/ than for /d,t/ or /g,k/. On the 
other hand it is possible that these results will prove to be peculiar to the types of 
patterns used or to the general methods of synthesis. Thus, it may be that the 
synthetic labial stops are less realistic than the others and that this produces the 
relatively greater variability we observed. Similar considerations must, of course, 
be taken into account in interpreting the fact that the labial, alveolar, and velar stops 
changed from voiced to voiceless at different amounts of first-formant cutback. 
Certain constant aspects of the patterns (such as the relative strength of the burst) 
may have contributed to voicing or voicelessness. If this were the case, the shift 
from voiced to voiceless might well occur at different first-formant cutbacks in 
different experiments, depending on the particular balance of cues. In exploratory 
work with quite a wide variety of patterns, however, we have obtained results that 
are, in general, much like those reported for the particular patterns of this experiment, 
and we think, therefore, that those aspects of the results being discussed here do 
have considerable generality. 

Comparisons among the stops in regard to the relative importance of the first- 
formant cutback or the amount of cutback required do not in any event affect the 
major conclusion of this part of the study: namely, that the presence or absence of 
a “cutback ” in the first formant is a sufficient cue, and very likely an important one, 
for distinguishing voiced and voiceless stops in initial position. 


EXPERIMENTAL SEPARATION OF TIME AND FREQUENCY 
CHANGES INVOLVED IN THE FIRST-FORMANT CUTBACK 


In the patterns of the first series of experiments, the cutback of the first-formant 
produced two correlated changes in the stimulus: (1) a progressively greater elevation 
in the starting frequency, and (2) a progressively greater delay in the time at which 
the first formant began. In the second series, we carried out a number of studies 
in an attempt to separate these two variables and to determine the role of each in 
the effects just described. In general we have found that both factors are important: 
a rising first formant is a cue for the class of voiced stops, and a time delay in the 
first formant, without the rising transition, is a cue for the voiceless stops. Thus, 
a continuum of patterns from voiced to voiceless can, in general, be produced only 
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Fig. 4. Illustrations of the patterns used to study the effects of the first- 
formant time of onset on the perceived distinction between /d/ and /t/. The numbers 
above the patterns show the extent of cutback in msec. 


by raising the first-formant starting frequency arid also delaying its onset. This is 
what we did in the first experiments, and this is presumably what occurs in nature. 

In a few special cases, however, it is possible to go from voiced to voiceless by 
making only one change. We were motivated to try to find such a set of stimuli by 
the desire to know whether the high degree of accuracy with which listeners identified 
the stimuli in the first experiment could be maintained when the physical difference 
was reduced to a single acoustic dimension—in this case, time of onset of the first 
formant. 

One of the special cases in which time of onset of the first formant is the only 
variable is shown in Fig. 4. When the stop is in front of /o/, as in this case, the 
first formant does not have far to rise, and, given a large second-formant transition, 
a reasonably compelling voiced stop can be produced with a first formant like that 
shown.’ The various patterns in the figure show successive delays of 10 msec. in 
the onset of the first formant. These patterns were recorded on magnetic tape and 
assembled into three random orders, each stimulus pattern being represented twice 


? The triangular patch just below and to the left of the first formant was added because it 
was found, in this special case, to increase the realism of the sound. It may function as 
a voice bar or possibly, in combination with the first formant, as a transition. In either 
event its position is constant in relation to the first formant, and the only difference among 
the patterns is in the time of onset of the first formant-plus-triangular-patch. 
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in each order. All of these stimuli were then presented to a group of 27 phonetically 
naive listeners for judgment as /d/ or /t/. The judgments of these listeners, plotted in 
Fig. 5, make it clear that the delay in first-formant onset is sufficient to cue the 
distinction between /d/ and /t/. More interesting, perhaps, is the fact that quite 
a few listeners sorted these randomly presented stimuli perfectly across a stimulus 
difference equal to only 10 msec. of delay in the first formant. Out of the total 
group of 27 subjects, 7 sorted perfectly on the first random order, 12 on the second, 
and 15 on the third. (It should be remembered in this connection that going through 
a single random order means going through the stimulus set twice.) In effect, these 
listeners were applying phoneme labels with perfect consistency to stimuli which 
differed by 10 msec. of first-formant delay. Given the appropriate non-speech 
controls, it should be possible in future research to determine whether this represents, 
as we think it does, an exceptionally great sensitivity brought about by long experience 
with the language. 


NOISE IN PLACE OF HARMONICS IN THE FORMANT TRANSITIONS 


As was pointed out earlier in this paper, it is to be presumed that the first-formant 
cutback corresponds to the period of aspiration. If so, the second and third formants 
should, of course, contain noise rather than harmonics for the duration of the first- 
formant cutback. This noise might be expected to be a cue for voicelessness, either 
by itself or in combination with the cutback. In this part of the study we have 
undertaken to measure the effect of combining noise in the second and third formants 
with first-formant cutback, and to compare this effect with that produced by first- 
formant cutback alone (harmonics rather than noise in the transitions of the second 
and third formants, as in the first experiment) and by noise alone (noise in all three 
transitions without cutback of the first formant). For this purpose we used the 
Voback, a relatively new synthesizer (Borst and Cooper, 1957). Like the Pattern 
Playback used in the first two parts of this study, the Voback converts hand-painted 
spectrograms into sound; indeed, the same spectrogram can be used with either 
instrument. There are many differences between these synthesizers, but for the 
present study the most relevant one is that with the Pattern Playback the experimenter 
can use only an harmonic series of tones as a source, while with the Voback he can use 
either an harmonic series or noise (though not both at the same time), as he wishes. 
It may prove to be of some consequence for this experiment that the Voback is 
significantly superior to the Pattern Playback in signal-to-noise ratio. 

Taking advantage, then, of the possibility of putting noise rather than harmonics 
into the formant transitions, we experimented with the types of stimulus conditions 
illustrated in Fig. 6. The pattern at the left, which is very similar to the zero cutback 
pattern of Fig. 2, is a starting point from which the changes illustrated in the 
other patterns are made. It has the fully rising first formant previously found to be 
a voicing cue combined, in this case, with place cues (burst and transitions of first 
and second formants) appropriate for /d/ before /9/. The next pattern to the right, 
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Fig. 5. Responses of 27 phonetically naive listeners to the patterns in which time of onset 
of the first formant was varied (see Fig. 4). Each listener judged each stimulus six times, 
making a total of 162 judgments per stimulus, The data are shown as the percentage of 
times each stimulus was heard as /d/. 
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labelled “ noise alone,” differs only in that noise has been substituted for harmonics 
in all three formants.’ Third from the left is the “ cutback alone ” condition, which 
8 In this case, and in all others in which noise was substituted for harmonics, the noise 


was 2 db down from the harmonics. These measurements were made with a VU meter, 
the noise and harmcnics being in essentially steady state during the measurement. 
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essentially duplicates the cutback patterns of the -first experiment. The “ noise 
plus cutback” at the extreme right differs from “ cutback alone” only in that noise 
has replaced harmonics in the first and second formants for the duration of the 
cutback of the first formant. In all cases the stimuli were changed in steps of 
10 msec. ; the patterns shown in Fig. 6 illustrate the extreme, or 50 msec., gondition. 
The stimulus conditions described here were arranged for /b,p/, /d,t/, and /g,k/ 
paired in all combinations with /e/, /2/, and /9/, a total of 54 stimuli. (As was 
pointed out above, the patterns shown in Fig. 6 are for /d,t/ before /3/.) 

Exploratory work with these various patterns made it clear that the use of noise 
alone (i.e., substitution of noise for harmonics in all three formants) produces essentially 
no voicelessness. This conclusion was based on the judgments of phonetically naive 
listeners. We proceeded, then, to look into the possibility that the rising first formant 
in the “ noise alone ” pattern (see Fig. 6) was such a potent voicing cue as to override 
any impression of voicelessness contributed by the noise in the formants. For these 
experiments we straightened the first formant in patterns that were otherwise like 
the one labelled “ noise alone.” Listening to these patterns made it clear that while 
this manceuvre may have reduced the effect of voicing, it did not produce voiceless 
consonants. 

The conclusion that voicelessness in stops is not produced by filling all the 
transitions with noise (without cutting back the first formant) is strengthened 
by suggestions that have come from some as yet incomplete studies on /h/. Here 
we find that steady-state (“‘ straight ”) formants which are filled with noise for the first 
50 msec. and then with harmonics for the remainder of the sound produce an 
impression of a brief whispered vowel followed by a longer voiced vowel. If, with 
this same pattern, we remove the first formant for the duration of the initial noise 
section, we hear something more closely approximating /h/ followed by a voiced 
vowel. To see these results in relation to those obtained in the attempt to produce 
voiceless aspirated stops, one need only take account of the common assumption that 
aspiration is to be equated, at least roughly, with /h/. 

The exploratory work with the patterns of Fig. 6 confirmed that, while putting 
noise into the formants does not convert voiced stops into voiceless ones, cutting 
back the first formant is quite effective for this purpose, as it was in the first 
experiments. Further exploratory work indicated that substituting noise in the 
second and third formants seemed to produce a greater impression of voicelessness 
than can be obtained with first-formant cutback alone. To permit a better comparison 
of the effects of “cutback alone” and “ cutback plus noise,” we obtained judgments 
from 32 phonetically naive listeners. The stimuli they judged were those illustrated 
by the two patterns at the right in Fig. 6 and described earlier in this section of 
the paper ; the “noise alone” condition was omitted. Thus, for each stop-vowel 
combination there were six patterns of first-formant cutback alone (from 0 cutback 
through 50 msec. in steps of 10 msec.) and six patterns of noise plus cutback 
(again from 0 cutback through 50 msec. in steps of 10 msec.). These conditions 
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Fig. 7. Responses by 32 phonetically naive listeners to patterns designed to compare the 
effects of cutback alone and noise plus cutback (see Fig. 5). ‘The data are shown as 
the percentage of listeners who identified each of the patterns as a voiced stop. The 
graphs are arranged in the rows according to the experimental conditions and in the 
columns according to the vowel with which the consonant transitions were paired, 
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were arranged for each of the three stops paired with each of three vowels, /e/, 
/z/, and /9/. 

As in the earlier experiment, the listener had to identify each sound according 
to place and voicing. Out of a total of 3,456 judgments made, only 87 (or 25%) 
were in error as to place. We have, therefore, ignored this aspect of the subjects’ 
judgments, and, as in the first part of the paper, have dealt only with the judgments 
of the sounds as voiced or voiceless. In Fig. 7 the judgments are plotted against 
the experimental variables: for the data plotted in the top row the variable was 
first-formant cutback alone; in the bottom row the variable was first-formant 
cutback with noise in the second and third formants for the duration of the cutback. 

Clearly, cutting back the first formant on Voback produced a significant shift 
toward voicelessness. In the graphs of the bottom row it is seen that a further 
contribution to voicelessness was made by substituting noise in the second and third 
formants for the duration of the cutback in the first formant. 

The condition of cutback alone essentially duplicates with Voback the experiment 
previously done on the Pattern Playback and reported in the first part of the paper. 
A comparison of the results obtained in the two studies shows that the first-formant 
cutback was somewhat less effective in the present experiment than in the earlier one. 
There is, however, a very close correspondence in all respects between the results 
obtained in the earlier experiment and those obtained with the Voback in the 
condition of first-formant cutback plus noise. Bearing in mind that there is 
considerably less background noise in the Voback than in the Pattern Playback, we 
can assume that the absence of noise in the transitions is more noticeable in sounds 
produced by the former machine than in sounds produced by the latter. Given 
the noise background in the sounds produced by the Pattern Playback the listener 
might well have “ filled in” with noise wherever appropriate, and it would, then, 
have made less difference whether we had put noise into the transitions or not. 
On this basis we should suppose that the data obtained with the Voback show a clearer 
separation of effects, and we should conclude that, while cutting back the first 
formant is a most effective cue to voicelessness, it is perhaps, somewhat less effective 
than we might have been led to suppose from looking at the data obtained with 
the Pattern Playback. We should conclude, further, that the addition of noise to 
the second and third formants for the duration of cutback in the first formant 
increases voicelessness, even though the addition of noise alone (without cutback) 
is not at all effective for this purpose. 








—_ 
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THE INCIDENTAL LEARNING OF ACCESSIBLE NAMES 
AND DEFINITIONS 


LEONARD W. Doos 
Yale University 


An experiment was conducted to test and improve three hypotheses concerning 
the effects of exposing adults to varying combinations of names, definitions, and 
referents when the motivation to learn was at a minimum and when instructions 
could be easily executed. Three balanced designs were employed: in one, drawings 
representing the referents of ten ambiguous names and ten connotative definitions 
were present initially during learning and also subsequently during testing ; in another 
they were present only initially ; and in a third only subsequently. It was found, in 
the first place, that denotative definitions could be accurately remembered or utilized 
under most conditions. Responding with a connotative definition to a name, secondly, 
was the least efficient method of learning but only in the absence of a denotative 
definition. Verbal responses, thirdly, were more efficiently recalled than verbal 
stimuli, and stimulus names more efficiently than stimulus connotative definitions, 
provided that they had been learned in the presence of a referent designated by 
the investigator. In general, the importance of the denotative definition was 
demonstrated. 


Access to new names and definitions is often if not usually easy, and whatever 
learning occurs is likely to be incidental. Dictionaries are at hand from which the 
meaning of a strange word can be obtained without a deliberate effort to learn and 
then retain the proffered definition. Or a speaker affixes a new or old term to a 
situation that he is describing or exhibiting with the hope that his audience, although 
seeking entertainment and not knowledge, will remember the verbal label. The study 
of names and definitions being described here was conducted under conditions of 
ready accessibility and of very low motivation to learn, in order to reproduce 
experimentally the state of affairs in real life. Under such conditions, what factors 
affect the learning of verbal materials ? 

A limited number of distinctions must be made at the outset. Names and 
definitions function as stimuli or responses. When one is asked to define “ triangle,” 
the name is the stimulus to which the definition becomes the response. Then 
connotative definitions are crudciy contrasted with denotative ones: the former use 
words exclusively, the latter point to referents or symbolic representations thereof. 
When the name, the connotative definition, or the referent for the denotative definition 
is provided by another person or by the environment, it is designated ; when it is 
provided by the person himself, it is contrived. Here the denotative definition is to 
be designated: “ Which of these figures is a triangle ?” Now it must be contrived: 
“ Draw a triangle.” 

Three simple hypotheses guided the planning of the research. They were based 
upon psychological assumptions whose bare outlines are given below. They were to 
be modified and improved, it was hoped, by the experiment. Under conditions of 
easy accessibility and low motivation to learn, then, it was anticipated that: 

















169 


1. A denotative definition would be learned more readily than a connotative one, 
a contrived denotative definition more readily than a designated one. The hypothesis 
assumes that a motivational increment results from the greater effort or greater 
discrimination that is required in each of the first instances than in each of the second. 

2. The sequence of definition followed by name would be learned more readily 
than the sequence of name followed by definition. The hypothesis rests on the 
unproven assumption that, from childhood on, people are asked or ask themselves 
more frequently to define terms than to supply names for definitions ; the unusual 
sequence (definition: name), therefore, provides an increment for learning not offered 
by the usual one (name: definition). 

3. A name or definition would be learned more readily when it functions as a 
response than as a stimulus. The initial response to a stimulus, it is assumed, involves 
perception ; such an internal response leads to other responses (including the goal 
response) and hence is an instrumental response ; a goal response is better retained 
than an instrumental one. 


METHOD 


The materials consisted of the following: 

(1) Names: ten two- or three-word phrases which were ambiguous or misleading 
without further definition: single-waved triangle, leftless man, well-stacked house, 
correct arrow, empty circle, fifth hand, cubed flag, silent bell, round square, double 
apple. 

(2) Connotative definitions : ten verbal definitions corresponding to the ten names 
above. “ Single-waved triangle,” for example, was defined as “a triangle with two 
straight lines and one wavy one” ; “ double apple” as “an apple with two stems.” 

(3) Denotative definitions : ten sets of line drawings which were projected upon 
a central screen, one set at a time. Each set contained four very similar drawings, only 
one of which corresponded to the appropriate name or connotative definition. To 
be certain that the definitions would be correctly utilized as easily as the other verbal 
materials, the drawings were pre-tested and then altered until perfect identification 
scores emerged. 

Three experimental designs were employed in order to study a limited number 
of possible combinations of the materials. Exposure hereafter refers to the conditions 
under which incidental learning originally occurred, testing to the conditions under 
which that learning was subsequently measured. In Design A, the drawings were 
present during both exposure and testing ; in Design B they were present only during 
testing ; and in Design C only during exposure. Precise details concerning each 
design are given below To shorten the presentation, letters are used to indicate the 
verbal elements: n stands for name, cd for connotative definition, and dd for denotative 
definition. If those letters stand alone, the name or definition is designated ; if they 
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are preceded by an x, it is contrived. In the description of the conditions, letters to 
the left of the colon refer to the stimulus, those to the right to the response ; in the 
description of the testing, parentheses around the letters signify that the stimulus 
consisted only of the word “name” or “ definition” and not the specific name or 
definition actually employed during exposure. 


Design A 

Exposure : drawings present under five conditions: (1) dd: xn; (2) dd: xcd; 
(3) n + dd: xcd; (4) n + cd: dd; (5) cd + dd: xn. Under condition 1, 
for example, the instruction was: “What name can you give to Figure B to 
distinguish it from the other three ” Or condition 4: “A correct arrow is 
defined as an arrow that points to the right ; which one of the four figures is 
a correct arrow ?” 

Testing : drawings present under four conditions: (1) dd: recall n or xn ; (2) dd: 
recall cd or xcd ; (3) (n) : recognize dd ; (4) (cd) : recognize dd. Under condition 
1, for example, the instruction was: “What name did you previously give to 
Figure B?” Or condition 4: “ Which one of the four figures was previously 
defined ? ” 

Groups: 16. All 20 combinations—S exposure x 4 testing conditions—were 
not feasible ; ¢.g., subjects exposed to condition 1 could not be tested under 
condition 2 since connotative definitions had been totally absent during exposure. 


Design B 

Exposure: drawings absent under three conditions: (1) n: xcd; (2) cd: xn; 
(3) n + cd: xdd. Under condition 3, the instruction was: “ Draw a figure that 
is named a cubed flag and is defined as a flag with three stars.” 

Testing : drawings present under four conditions: same as Design A except that 
conditions 3 and 4 required the subjects not to recognize the drawings but to 
contrive denotative definitions by identifying “one of the four figures .. . 
previously ” named or defined. 

Groups : 12. 


Design C 

Exposure : drawings present under three conditions: same as exposures 3, 4, and 
5 of Design A. 

Testing : drawings absent under two conditions: (1) (dd) : recognize n ; (2) (dd) : 
recognize cd. Under condition 2, for example, the instruction was, “ What 
definition was previously given the one figure in the group to which you paid 
special attention ?” ; there followed four connotative definitions, one of which 
was ticked off. 

Groups: 4. All 6 combinations—3 exposure x 2 testing conditions—were not 
feasible as indicated under Design A. 
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Each of the 32 experimental groups contained 6 subjects. The materials were 
always presented in the same order. The relative position of the four drawings of 
each set was randomly varied for their second appearance during testing in Design A. 
Instructions for each of the 10 tasks during exposure and during testing had to be 
executed within 30 seconds. During the exposure the subjects were unaware that 
they would be subsequently asked to recall, recognize, or identify some of the 
materials. To give them the impression that the experiment had been completed 
at the end of the exposure period, furthermore, they were set to work upon an 
intriguing and solvable arithmetical problem (which none of them solved) for six 
minutes before the beginning of the testing period. 

College students representing three comparable samples participated as groups 
during their scheduled classes. Each group was given one of the three experimental 
designs. Replies were recorded in booklets, each page of which contained relevant 
instructions and was turned over only after a signal from the investigator. Different 
booklets corresponding to the required experimental groups were randomly and 
simultaneously distributed, but the subjects remained unaware of the fact that more 
than one set of instructions was circulating. Odd-even coefficients of correlation 
for the 10 testing measures varied from + 0-79 to + 0-93 (uncorrected). 

Complete objectivity in scoring could be achieved when the subjects were 
required to recognize names or definitions or, in Design B, to identify the drawings 
previously named or defined by ticking off one of four alternatives. Since only 
names which were recalled exactly as designated or contrived were scored as 
correct, perfect agreement between the investigator and an assistant was attained. 
In 96 per cent of the instances there was also agreement in connection with the 
recall of definitions since, to be counted as correct, essential words or phrases had 
to be recalled exactly. 


RESULTS 


Perfect recall, recognition, or identification scores were recorded for 64 per cent 
of the 192 subjects. Significant deviations from the highest possible score of 10 
occurred only in certain experimental groups, of which there were 4 out of 16 
groups in Design A, 6 out of 12 in Design B, and 0 out of 4 in Design C. Under 
these circumstances, Bartlett’s test showed that the variances within the groups were 
not homogeneous ; hence raw scores were subjected to a / x + 0-5 transformation, 
as a result of which significant differences with respect to variances were eliminated. 
The analyses of variance produced for Designs A and B p-values of less than 0-05 
for exposure and testing methods and for the interaction between the two; but for 
Design C the values were not significant. The results from Design C immediately 
indicate non-confirmation of the third hypothesis (responses are learned more readily 
than stimuli) when learning was tested by the method of recognition. 
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The analysis could then be conducted by comparing the scores of various relevant 
and equivalent experimental groups either by means of ¢ tests or by x’. The ¢ tests 
were based upon raw scores and the x’, always corrected for continuity, were computed 
by dichotomizing the scores into perfect ones of 10 and imperfect ones of less than 
10. Unless otherwise indicated, differences can be assumed to be significant at 
the 0-01 level ; to simplify the presentation, all tests, including those of the three 
hypotheses, are two-tailed. Each of the principal conclusions to be drawn is first 
presented below in an inductive manner and then subsequently related to the three 
hypotheses being appraised. 


Since their presence or absence distinguished the three experimental designs, 
the first conclusion pertains to the learning of denotative definitions: the accuracy 
with which denotative definitions could be remembered or utilized was unrelated to 
whether they had been previously designated by the investigator or the subjects, 
contrived by the subjects, or completely absent. The mean recognition scores for 
the drawings in all experimental groups of Design A (in which the drawings had been 
designated by the investigator or subjects during exposure) and—with an exception 
to be discussed in the next paragraph—the mean identification scores of .Design B 
(in which drawings had been either absent or contrived by the subjects) were 
virtually perfect. The second section of the first hypothesis, suggesting that 
contrived denotative definitions would be more readily learned than connotative ones, 
consequently, was not verified. Its first section pertaining to an anticipated 
superiority of denotative over connotative definitions was also contradicted when 
either recognition or recall ‘was the method of testing. With recognition, the mean 
scores for connotative definitions in Design C (drawing absent during testing) 
approached the maximum of 10. An additional control group sketched its own 
drawings during exposure and then achieved a recall score of 9-7 for these contrived 
denotative definitions ; but that figure was not significantly different from the 
mean recall score for connotative definitions under identical exposure and testing 
conditions. 


The second conclusion focuses upon different exposure methods: contriving a 
connotative definition in response to a name which was unaccompanied by a 
designated denotative definition was the least efficient condition for learning, regardless 
of the testing method subsequently employed. With the drawings absent during 
exposure in Design B, a group that had been instructed, for example, to “ define 
briefly a figure that is named an empty circle” did not efficiently recall the name 
of the contrived connotative definition, nor accurately identify the correct drawing. 
The mean recognition-recall score for the inferior exposure method was 6-8, whereas 
that for contriving a name in response to a connotative definition was 9-2 and that 
for contriving a denotative definition in response to a name and a connotative definition 
was 9-1 ; both differences are significant beyond the 0-01 level. Even the identification 
scores for the drawings decreased when that method was used, a fact which accounts 
for the previously noted exception to the tendency for the drawings to be very 
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accurately identified and recognized. This decided inferiority of the sequence of 
name followed by definition in Design B thus verified an aspect of the second 
hypothesis. In Design A, on the other hand, when the drawings accompanied the 
name for which a connotative definition had to be contrived, the sequence was no 
less efficient than other exposure methods. 

The remainder of the statistically significant differences involved only recall scores 
for names and connotative definitions, since the recognition scores for these verbal 
materials in Design C, as already stated, were virtually perfect. In the second 
conclusion presented above, one factor affecting their recall has been isolated, viz., 
the role of the denotative definition. Two other factors were important: recall 
depended upon whether the verbal materials had functioned during exposure as 
stimuli or responses and whether they were names or connotative definitions. 
The three factors so completely interacted that no one of them can be discussed 
without simultaneous reference to the other two. A summary of the results, 
consequently, must be embodied in a single table. 


TABLE 1 
Names Connotative Total 
Definitions 
(1) (2) (3) 4) 6) © (7) (8) @ 
Role of 
Denotative S R_‘ Total S R__ Total S R__ Total 
Definition 


(1) S:designated 7:2 10 91 50 95 81 61 98 9.0 
(2) R: designated 58 x 58 37 x‘%x 3-7 48 x 48 





(3) Total 65 10 83 43 95 69 53 98 76 
(4) R: contrived 2s « 92 x 92 83 «x 83 
(5) None 67 85 76 92 76 8&4 79 81 80 
(6) Total 70 85 7:5 92 76 8&7 81 81 Sl 


Mean recall scores of names and connotative definitions functioning during exposure 
as stimuli or responses and with different roles performed by the denotative definition. 


All the figures in Table 1 are mean recall scores based upon a maximum score 
of 10. Both the rows and columns are numbered so that hereafter brief references 
can be made to them. The rows indicate the role of the denotative definition during 
exposure: rows 1-3 are from Design A, rows 4-6 from Design B. The principal division 
of the columns is between names and connotative definitions, and these in turn are broken 
down by function as previously exposed stimuli (S) or responses (R). With few 
exceptions each mean is derived from a single experimental group and hence is based 
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upon only six subjects. The symbol x in the body of the table signifies that a group 
in that cell was not experimentally feasible ; for example, subjects instructed to find 
the correct drawing and thus designate a denotative definition (row 2) could not also 
be told to contrive a name (column 2). 

The data of Table 1 support only one conclusion of general significance: when 
a denotative definition was designated during exposure, verbal materials that had 
been functioning as responses were more efficiently recalled than those functioning 
as stimuli, and stimulus names were recalled more efficiently than stimulus connotative 
definitions. The most concise evidence for the conclusion can be found in the 
summary of Design A (row 3), but it is important to note that the same trends 
appear when the denotative definition was a designated stimulus (row 1) or a 
designated response (row 2). These trends, being based upon a small number of 
cases, however, are not always significant; for example, the difference between 
the recall of stimulus names and connotative definitions in row 1 does not quite 
reach the 0-05 level (x”). In addition, row 2 contains no groups in which names 
and connotative definitions functioned as responses during exposure. When the 
role of the denotative definition is thus restricted, the third hypothesis of the 
investigation may be said to have been verified. 

The two relations indicated in the third conclusion above and the one suggested 
by the third hypothesis did not appear in Design B either when the denotative 
definition was contrived by the subject (row 4) or when it was totally absent (row 5). 
In fact, if the two roles of the denotative definition may be combined (row 6), quite 
a different conclusion must be drawn, for under those conditions verbal materials 
functioning as responses during exposure were not recalled more efficiently than 
those functioning as stimuli, and as stimuli, connotative definitions rather than names 
were more efficiently recalled (p < 0-05). The responsibility of the denotative 
definition for the differences can be proven directly by comparing, for example, 
the results obtained in its presence as a stimulus (row 1) with those obtained in its 
absence (row 5). Evidently as a stimulus the denotative definition decreased the 
recall of stimulus material (column 7 ; p < 0-05) and increased the recall of response 
material (column 8; p < 0-05); and it may have had that effect because of its 
deleterious influence upon stimulus connotative definitions (column 4; p > 0-05). 


DISCUSSION 


The outstanding finding of the investigation involves the denotative definition 
that enabled referents to be designated or contrived: it played a supremely 
important role under the experimental circumstances which provided low motivation 
to learn easily accessible verbal materials. Regardless of initial exposure conditions, 
these referents could be recognized or recalled with almost perfect accuracy. Even 
when no referent had been previously perceived or contrived while providing names 
for connotative definitions, drawings which were seen later for the first time could 
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be identified, again almost perfectly, on the basis of recalling the verbal material. 
Only after connotative definitions had been contrived for names did such identification 
become poor. In addition, the experience with referents during exposure affected 
markedly the recall of the accompanying names and connotative definitions. 


It seems probable that the simple drawings, the non-verbal or “real” factor in 
the experimental situation, may have been more striking, useful, or compelling than 
the verbal materials. When the subjects were offered both drawings and verbal 
materials as stimuli (Table 1, row 1), for example, they may have observed the 
referents more carefully and hence have tended to neglect the accompanying names 
or connotative definitions. The verbal stimuli, however, were the only clues provided 
in the absence of the drawings (rows 4 and 5); perforce they—and especially the 
more explicit connotative definitions—were carefully observed. On the other hand, 
when the response during exposure consisted of designating the correct drawing 
(row 2), the subjects may have glanced at the equally critical verbal stimuli, and 
then have concentrated their attention upon the response itself. Such an explanation, 
it is admitted, is sheer speculation but illustrates the complexity of the problem. 


As a result of the investigation the three original hypotheses must be severely 
revised. The first, which anticipated that denotative definitions in general and 
contrived denotative definitions in particular would be more readily learned than, 
respectively, connotative and designated denotative ones, received no support whatsoever. 
Before being abandoned, however, it ought to be tested over a period of time 
longer than the six minutes or so which elapsed between exposure and testing in 
this investigation. Then, too, the more sensitive method of recall might be used in 
comparing the two kinds of denotative definitions ; the present design employed 
only recognition. Under such different circumstances, the advantage of additional 
mental or physical activity might conceivably manifest itself. 

The second hypothesis asserted that respording to a name with a contrived 
connotative definition would be an inefficient learning method. Under the experimental 
conditions here studied that method was infericr not only to its converse but also— 
as not anticipated—to a third exposure method. In addition, the hypothesis as 
originally stated did not recognize the possibility that an accompanying denotative 
definition could raise the efficiency of an inferior exposure method. In passing it 
is well to state that the evidence adduced in favour of the second hypothesis as well 
as that for the third does not necessarily vindicate the particular psychological 
assumptions on which they were based. 

The third hypothesis, it has become evident, was valid only when the method 
of recall was used. As stated, furthermore, it failed to specify the role of the 
denotative definition during testing and to distinguish between names and definitions. 
The interaction which determined whether a response was learned more readily than 
a stimulus, moreover, doubtless involved other factors which were not isolated or 
measured in the experiment. Unlike most of the responses to the drawings, for 
example, all the verbal responses were contrived; would different results have 
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been obtained if they had been designated ? Or consider the intricate problem 
which arises when the three hypotheses are used to compare two exposure sequences: 
(1) name: connotative definition and (2) connotative definition: name. According 
to the third hypothesis, the connotative definition in the first sequence should 
be learned more rapidly since it is functioning as a response; and a variant of 
the first hypothesis would also support the prediction if the definition is contrived. 
The second hypothesis, in contrast, suggests exactly the opposite: the connotative 
definition in the second sequence should be learned more readily since there it is 
functioning in the “ unusual” role of stimulus. It may be that, in the absence of 
a denotative definition provided by the investigator, these two tendencies cancel one 
another ; but that, in the presence of such a stimulus, a connotative definition in 
its unusual role as stimulus perhaps can be ignored and so such a definition will 
be learned more readily only while functioning as a response, especially as a contrived 
one. 

Do all the complications of this deceptively simple experimental situation, which 
produced incidental learning of easily accessible materials, reveal practical rules- 
of-thumb ? If forced to reply and if permitted to stress the usual disclaimer of 
other-things-being-equal, the investigator would reluctantly state: 

(1) If people are to recall their verbal responses, it is probably better to provide 
them with a referent as stimulus ; 

(2) If they are to recall the verbal stimuli to which they respond, perhaps it is 
better not to give them a referent but instead either to make them contrive one 
or else to have them function without any attention to a denotative definition ; 

(3) If they are merely to recognize verbal materials or to recognize or recall 
referents, it scarcely matters how they are treated verbally. 


The author wishes to thank Professors Claude E. Buxton and Carl I. Hovland who 
have been extremely helpful; they have of course no responsibility for details or 
the outcome of the investigation. 
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DIFFERENCES BETWEEN THE CHILD AND THE APHASIC 


RosBert W. ALBRIGHT 
Arizona State College 


Comparison between children and aphasics shows more differences than similarities 
in their language and behaviour. Just as atrophy in plants or animals may bear 
false resemblances to normal stages of growth, so the pathology of aphasia is not 
to be understood in the same terms as child development. 


Similarities between the speech of brain injured adults and that of children have 
been noted by many writers. One of the founders of child psychology and embryology, 
stated that the speech disturbances in the aphasic “ . . . are not merely somewhat 
similar in general, but identical with those of the child just learning to speak” 
(Preyer, 1893). More recently, the dissolution of speech in aphasia has been 
described in terms of its mirroring, in reverse order of course, the evolution of 
speech in the child (Jakobson, 1941). Jakobson also claims that in aphasic rehabilita- 
tion the phonemes are re-established in the same order that they are acquired by 
the child. 

Some writers such as Henry Head have been more cautious in drawing inferences 
from the similarities between the speech of the brain-injured and the child. He 
summarized his position thus: 


The form of behaviour we call the use of language has a history, and many of 
the phenomena of disordered speech resemble steps by which the complete act 
was acquired in each individual. The patient may revert to a more primitive mode 
of response. Apart, however, from the tendency to adopt such methods of 
executing some particular task, the abnormal manifestations do not strictly 
correspond to any stages in the historical evolution of speech (Head, 1926). 


Critchley (1952) has also expressed scepticism concerning the claims made by 
other writers that there are extremely constant phonemic changes in the language 
systems of aphasics. He emphasizes the extreme variation shown by aphasics in 
using the phonemes of a language, a variation that casts doubt upon theories that 
assume regularity of phonemic change in aphasia. 

Granted that similarities may be found between aphasia and child language, this 
paper is written from the point of view that such similarities are more superficia’ 
than real, and its purpose is to call attention to some of the differences. 
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LINGUISTIC DIFFERENCES 


Many aphasic adults have a much larger vocabulary and more language compre- 
hension than a young child. Besides, they usually retain linguistic patterns which 
do not require a high degree of symbolic formulation and expression: exclamations, 
automatic or emotional expressions, and profanity (Head). These habitual speech 
patterns are not found in such quantity in the language of young children. Also, 
the sound-systems of child language do not have the conventional patterning which 
is often found in the language of aphasics (Albright and Albright, 1956). 

In what Grewel (1951) has called the “dimensions” of the language system, 
differences can be found. He defines language as a “ polydimensional system of 
signs ” and breaks it down into five sub-systems: (1) a system of distinctive sound- 
elements or phonemes, (2) a system of words—phonetic-semantic units, (3) a system 
of possibilities of word-formation, (4) a system of possibilities of sentence-formation, 
(5) a system of accents (pitch, stress, and length). 

According to Grewel’s framework, one might say that the aphasic had more 
difficulty in sub-systems (3) and (4) of a language and less in sub-systems (1), (2) and 
(5). The child, however, would present a different picture with more difficulty 
in sub-systems (1) and (2), and less in (3), (4) and (5). These differences in the 
“ dimensions ” of language mean that the aphasic can usually pronounce sounds 
and words, but cannot formulate them into sentences. The child, on the other hand, 
can formulate sentences in his “little language”, but has trouble producing the 
sound-patterns of the conventional language. 

On the hearing side of language, one often finds that an aphasic with extremely 
impaired expression can note slight phonemic distortions in a sentence spoken to him. 
An aphasic patient of this writer’s who had severe expressive impairment detected 
the substitution of /o/ for /e/ in the word tell in the following sentence: If your 
wife is able to come down next Sunday, I'll tell her about the work you and I have 
been doing. Certainly, young children do not have such a highly developed capacity 
for phonemic hearing. 

Other differences appear in the language habits of the aphasic and the child. 
Many aphasics, who can speak, read, and write, have difficulty reciting or arranging 
the letters of the alphabet (Head). One of the aphasics with whom this writer worked 
recently could do arithmetic problems with speed and accuracy on paper, although she 
could not count beyond five with spoken words. It is interesting to note that 
she could work problems orally only when the clinician kept repeating the number 
formula: “ What’s nine times seven ?—nine times seven ?—nine times seven ? "— 
a method which kept the symbols before her in a way similar to the constancy 
furnished by the printed page. She could also write numbers easily and correctly 
up to one million. Young children, who can be taught to count aloud to ten with 
little difficulty, do not exhibit such abilities, nor such unevenness in their abilities. 

It is in the functioning of the spoken language that the aphasic usually suffers the 








~»-— 








Robert W. Albright 179 


most difficulty and in which he probably exhibits the greatest differences in comparison 
with the child. As mentioned above, many aphasics can read, write, spell, and 
count much better than they can speak. They usually have more of the structure 
of the language than the child has in terms of sounds, words, and patterned expressions 
such as “I don’t know,” “I can’t get it,” etc., but they have more difficulty setting 
the structure into motion. The aphasic’s trouble seems to be mainly one of putting 
things together and keeping them apart, of linguistic analysis and synthesis. The 
relations among sounds, meanings, and situations create far more interference in 
the speech processes of the aphasic than do the “ Spoonerisms” of normal children, 
and one should add, of “normal” adults. Related associations disturb the speech 
processes in all of us, but in the aphasic these somewhat chaotic processes underlying 
“ inner speech ” tend to remain chaotic in his “ outer speech”. Lashley’s contribution 
to the Hixon Symposium on cerebral mechanisms provides an interesting outline of 
the processes involved in formulating, directing, and ordering the serial behaviour 
which is disturbed in the aphasic’s speech (Lashley, 1951). 

A verbatim record of a patient illustrates the sharp contrast that may exist in 
aphasia between the accuracy of the conventional, speech forms and the disorder 
on the syntactic and semantic levels of the language system: 

“ Well I will tell you as clearly as I can because it was not very clearly to me. 
Well, it was done out of, I was not in order and nothing happened to me at all 
because it was too . . . well, I was not careless, I was not able to give you the 
information because I was half ill, very, well, fairly ill you see, due to, what do 
you call it, I think . . . I was very well, I do not realize what it was at all, because 
I cannot realize what was happening. I think it was done out of your orders 
here, won’t I, yes, I do not think I could account for it, it was simply . . . I was 
ill and it was not in a condition to account to you for the trouble I am afraid... ” 
(Critchley, 1952). 

The above indicates that the patient understands what he wants to say, and has 
the sounds, words, and phrases of the language but cannot organize them into 
meaningful, linguistic sequences. It is doubtful if such a sample of discourse could 
be obtained from a child. 


SOCIAL AND PSYCHOLOGICAL DIFFERENCES 


The brain-injured adult has suffered a severe setback ; he must adjust to his 
changed relations with the world on a level that is usually inferior to and quite 
different from his former level. For most aphasics, the total life-situation is one of 
a person invalided in various ways and to varying degrees. The child, on the other 
hand, is moving ahead in a developmental mastery of the surrounding world that few 
of us ever match during our later years. 

Serious changes in the aphasic’s relations to his job, family, and the rest of his 
cultural environment mean that his behaviour no longer fits into the world like that 
of a normal adult. The aphasic usually has a good deal of difficulty in formulating 
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and carrying out behaviour that is appropriate for an adult. Patients may have 
trouble in arranging concrete situations such as setting a table (Head, 1926). 

The child’s behaviour, though not always appropriate from an adult point of view, 
is appropriate to the child and his relations to the world. He is a child and behaves 
most of the time like a child ; the aphasic is an adult but, unfortunately, his behaviour 
does not fit into adult situations with the same degree of appropriateness. 

One of the greatest differences is the restriction of behaviour which the aphasic 
suffers and which represents a great loss in his individual freedom of thought and 
action. A child also has restrictions upon his behaviour, but they are of a different 
kind. For one thing, most of the child’s restrictions exist outside of himself in 
the form of prohibitions of various kinds, while the aphasic’s exist mainly within 
himself in the form of damaged symbolic processes. The child’s reaction to external 
obstacles tends, consequently, to be one of resentment and irritation while the aphasic’s 
reaction to internal interferences in thought and language is more often one of 
discouragement and anxiety. One does not, for example, find euphoria and depression 
in the behaviour of a normal child, although these abnormal psychological conditions 
do occur in aphasics (Eisenson, 1949). So far as “insight” into their behaviour is 
concerned, the child who exercises “insight” is probably reminded of being a 
child, the aphasic of being a brain-injured adult. 

Considering the psychological differences between them, it seems justifiable to 
attribute to the child a greater degree of ability for assuming the appropriate “ set ” 
toward a task, and in retaining that “set” in ordering, directing, and integrating 
his behaviour, especially his symbolic behaviour. 

In summarizing the preceding discussion, it should be stressed that the similarities 
often noted in the speech of the child and the aphasic do not justify the assumption 
of identical processes in their thought and language. Their life-situations are not 
the same, one being a child, and the other a brain-injured adult. Also, the aphasic 
usually exhibits far more unevenness in the level of skiil shown in all of his habits, 
including those of language. Finally, the disintegration and disorder of the speech 
processes in aphasia contrast sharply with the order and growth that characterize 
child language. 
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AN ANALYSIS OF STRUCTURED CONTENT, WITH 
APPLICATION OF ELECTRONIC COMPUTER RESEARCH, 
IN PSYCHOLINGUISTICS 


Tuomas A. SEBEOK AND V. J. ZEPS 
Indiana University 


Certain linguistic and psycholinguistic hypotheses require information readily 
derived only from the processing of numerous texts. The employment of electronic 
computers makes the testing of such hypotheses reasonably feasible ; the machines 
also tend to relieve the researcher of the burden of compiling lists and reference indices 
“manually.” We have tested programming possibilities on a pilot sample. Two 
programmes which have proved successful are reported below: one designed to 
tally co-occurrences of units within frames, the other to yield inventories of units. 


INTRODUCTION 


Four thousand relatively short texts are being processed for storage on IBM cards, 
and programmes being designed for information retrieval from a basic body of texts 
by means of electronic equipment. The types of information desired range from 
content analyses of individual texts to the discovery of the rules by which the 
messages under consideration have been encoded. At the present stage, we have 
accomplished the storage of a pilot sample, for which information retrieval programmes 
have been prepared. 

The theoretical background for this research was sketched in Psycholinguistics : 
A Survey of Theory and Research Problems (Osgood and Sebeok, 1954; see also Osgood, 
Suci and Tannenbaum, 1957 ; Osgood and Anderson, 1957). In following up certain 
lines of investigation suggested in that monograph, it became clear that many of the 
hypotheses, while suggestive and certainly feasible in theory, are impracticable, save 
with the use of digital computers.’ To investigate operational possibilities, whereby 
promising hypotheses in psycholinguistics and related fields could be empirically 
tested and involved operational procedures successfully carried out, we have under- 


! Our experience thus tends to support G. A. Miller’s sentiment: “In moments of enthusiasm 
I sometimes wonder whether the digital computer may not be as important to the social 
sciences as the microscope was to the biological sciences;” cf. Social Science Research 
Council ITEMS, Vol. 41 (1957), p. 44. Miller also cites applications in linguistics aleng 
quite different lines. 








182 An Analysis of Structural Content with Electronic Computer 


taken a pilot project in the use of electronic equipment for the retrieval of structured 
content and mass-processing of coded material, as set forth below.’ 


Corpus 


The corpus consists of four thousand short texts—folksongs—in Cheremis, the 
language chosen for purposes of demonstration.’ These texts constitute a corpus 
which is uniform in some respects (literary genre, period, and general culture) and 
diverse in others (source of transmission, manner of transcription, dialect of encoder). 


STORAGE 


The choice of a method of storage has been conditioned by available equipment 
and the type of information to be retrieved. The corpus has been stored on IBM 
cards, to be processed by an IBM 650 magnetic drum data processing machine. The 
pre-editing has included the retention of such information as is readily apparent to 
‘a trained analyst. 


(1) Classification of texts. 

The texts were initially classified by source of transcription, which for all other 
purposes is an incidental way of classification. Other possible classifications (e.g., 
by content, literary form, dialect of encoder, function of the message, or the like) 
are objects of analysis and hence information to be retrieved. 


(2) Pre-editing. 

The pre-editing utilizes an auxiliary code to indicate form-classes of linguistic 
units, juncture hierarchies between such units, syntactic function of the same, and 
such metric organization as is readily apparent. 


2 For reports on another project in a somewhat related area, see R. Busa, “I principali 
problemi dell’automazione del linguaggio scritto” (Rome, 1957); and “ Die Elektronentechnik 
in der Mechanisierung der sprachwissenschaftlichen Analyse”, Nachrichten fiir Dokumentation, 
Vol. 8, No. 1 (1957). It is interesting to note that, although the research reported on in these 
papers differs both as to motivation and to detailed application, our respective programmes, 
though developed independently, are strikingly similar. Also spurred by “the need for better 
structural understanding of language in many data processing applications,” the National Bureau 
of Standards has applied similar techniques in the study of some aspects of English sentence 
formation ; cf. “ Electronic Computer Study of English Syntax patterns”, NBS Technical 
News Bulletin, Vol. 41, pp. 84-86 (1957). 


3 Cheremis is a Uralic language, spoken by a minority group of about one-half million in 
East Central European USSR. For further information on the Cheremis language and 
culture, see the six volumes hitherto published in the Studies in Cheremis series, especially 
Sebeok, Vol. 5, The Cheremis (New Haven, Connecticut, 1955). 
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It is understood tha: the extracting of information from structured material 
involves making the structure explicit at either the data- or the programme-level of 
processing. Depending on the initial decision, the programme either has to rely 
on signals inherent in the structure (the approach adopted by most researchers in 
mechanical translation), or can also utilize an auxiliary coding, introduced at the 
pre-editing level. Either decision has immediate disadvantages. The first type of 
procedure requires little work from the pre-editor, but presupposes an exhaustive 
knowledge of the system ; (this, however, is one of the things we are trying to discover 
from machine processing). It involves a long programme, and poses storage 
problems for machines with limited memories. The other type, on the other hand, 
requires a large number of low level decisions from the pre-editor, but requires neither 
as thorough an acquaintance with the material nor as spacious a programme. A third 
alternative is available, if it is known for certain what kind of questions are not 
going to be asked concerning the corpus. In that case, simpler procedures may be 
satisfactory. 

While the exact physical shape of the code and auxiliary code is, in principle— 
though not, of course, in practice—irrelevant, we give it below as an illustration, 
along with a sample sentence in various stages of pre-editing. 

The basic coding consists of alphabetic and other characters as well as the numerals 
1 and 2. The coding has a one-to-one relationship to the linear phonemes of the 
corpus. The basic coding is augmented at immediate constituent boundaries by the 
auxiliary coding. The latter utilizes the numerals 3 to 9 and occupies one to four 
positions in the composite code. The first position indicates the hierarchical level 
of the immediate constituent boundary ; the second, if filled, the major form-class 
of the preceding word ; the third, if filled, the function of the clause constituent ; and 
the fourth, if filled, the end of a line. 


BASIC CODE 
Phoneme Code character Phoneme Code character 
a A r R 
b B § S 
é Cc t T 
d D u U 
e E 6 Vv 
v F ii Ww 
g G s x 
n H 3 Y 
i I z Z 
1’ J 8 / 
k K Zz or x Zero 
l L é€ or 1 
m M 3 2 
n N a 12 overpun -h 
° O z Blank 
Pp P Note: Dialects which have Z or ¢ 
n Q do not have x or € respectively. 
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AUXILIARY CODE 


First Second 

3 — morpheme boundary 3 — substantive 

4— stem boundary 4— verb 

5 — word boundary 5 — particle 

6 — phrase boundary 6 — expressive form 


7 — clause constituent boundary 
8 — clause boundary 
9—sentence boundary 


Third Fourth 

3 — subject The fourth numerical position is re- 
4 — complement served for metrical coding, if necessary. 
5 — predicate 


6 — adverb(ial phrase) 
7 —not assigned 


While being pre-edited, a sentence undergoes the following stages: in broad 
phonetic transcription, jum5n kiikii at’Sam ulo; in phonemic transcription, iumon 
kiikii a¢am ulo ; and, recast into symbols that the machine will accept, iumyn kwkw acam 
ulo ; finally, in auxiliary numerical coding, iumy4nS3kwkw63aca4m733ulo8453. At 
this point, the pre-editor feeds the text into the machine for translation into a 
bi-quinary code for internal analysis. 


RETRIEVAL 


The choice of a method of retrieval depends upon the type of information desired. 
In some instances the necessary information may be retrieved by the use of the 
sorter, or other relatively simple equipment. For other information, it is necessary 
to design extensive programmes or modify, in varying degrees, programmes developed 
for our pilot project. 

The vast majority of our programmes are yet to be written ; we expect our future 
operations to proceed as follows: we propose to utilize the programmes we have 
available now for generating hypotheses from our pilot sample of one hundred 
texts. Once hypotheses have been generated, one of the existing programmes will 
be utilized to retrieve the necessary information, or a new or partially new programme 
will be written. Eventually we shali arrive at a basic battery of programmes which 
will be used for the retrieval of information needed to test a very large number of 
hypotheses, ranging from simple to complex, and from purely linguistic and psycho- 
linguistic hypotheses to others extending into allied fields of investigation. Likewise. 
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TABLE 1 
IDENTIFICATION 

TEXT SOURCE TEXT CARD 
RUZMY4HS3ZBRUGESZINAT4HSIAKPYS3SZPWHC 5 182 1 
Y4MB34U3ZYSTZYMYSMS3YST4HYSZNAT4ES 5 182 2 
OLAZCAS3PVRT4YNSZ4ZOLAZBCA5ZSPVR 5 182 3 
T3VT63KVRGV4SY3ZY7363NYL4NSSIK 5 182 4 
63PHWLEMH4ETZYM734PWL4YHYS3Z3NAB4HS3N 5 182 5 
YL4AN53 IKS3PWLEM63KVRGV4SY32ZY73 5 182 6 
63SI53WXTELGSYU7Z4SYND4AYSINAB4HS 5 182 ? 
ZSSISZWXTELYETEZSUMBAL4Y ANZ ZY7363 5 182 8 
SIS3TATIQGA4YM7T3R4SYND4YSZNAB4S53 5 182 9 
SIS3TAIJIQGA4TESWMBALHYAN3I2ZY¥7363 5 182 10 
NAGORNYIS3KOLG4ET7T3Z4PYTHYSZINASBS 5 182 11 
S3NAGORNYIS3KOLGETEZSZPWCKZAS4HYZ 5 182 12 
Y7343NAGORNYI53 KW 4YETITZSZKHLYES 5 182 13 
BLS53NAGORNYIS3 KW Y63KUC3ZBAS4HYZY 5 182 14 
7343O0ONDRISZERGY63MIGLAI733KWL4 5 182 15 
ES8453ONDRIS3RERGY63MIGLAI4ZYSBL. 5 182 16 
AN7Z343 ITEPIM5S3WDYRE63MASA733 KWL4 5 182 17 
SS84SRIEPIMS3ZWDYRE63MASA4HZZLAN? 5 182 18 
ZKLREMOXKO4GYCSZSTOLYSYE6E3PITYR4GY 5 182 19 
CSZTOLYSY6383INDESSZ3SHYURZSNZANE63 5 182 20 
LUPS733KWL4ES9453 5 182 21 


Sample pre-edited text. 


we will extend the scope of our operations to all four thousand texts, if required 
by the nature of a hypothesis. 

Two programmes have been tested to date, designed for two main purposes: to 
yield co-occurrence tallies of units within frames, and to print unit inventories. The 
tallies and inventory items were further manipulated by peripheral equipment (sorter, 
duplicator, tabulator, and the like), to make them more useful as reference material 
in testing of our hypotheses. 

In either programme, a piece of text is shifted through the accumulator in a 
continuous manner ; the operation of logical branching forms the backbone of the 
programmes. The two flow charts in Figs. 1 and 2 represent the process graphically 
in somewhat simplified manner. 


(1) Co-occurrence tally programme. 

The co-occurrence tallies are mainly intended for use in calculating—with the 
aid of a supplementary programme—such contingency ratios as are discussed in 
two other papers (Sebeok, 1957, and Saporta and Sebeok). 

The final shape of the contingency analysis permits us to ask (and answer) 
questions of the following order: “Given an occurrence of unit A anywhere in a 
frame x, what is the expected probability of B occurring anywhere in the same 
frame x, and how does the actual occurrence contrast with the expected ?” The 
contingency ratio r at the hierarchical level of frames x is a ratio of an actual a and an 
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Fig. 1. Flow chart for co-occurrence tally programme. 
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Fig. 2. Flow chart for unit inventory programme. 
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expected e frequency: 1, = =. The calcuation of e, can be easily accomplished 
only under a very restricted set of conditions. One such condition is large frame 
length. In such a case, the occurrence of a word A in any one of the slots within the 
frame does not appreciably decrease the number of slots available for the occurrence 
of the word B. For short frames, however, the complexity of the statistics involved 
in calculating the exact probability is prohibitive, and approximations have to be 
made. The desired set of interrelated matrices for all levels of frames of the ratios 
re for all relevant units A-W, can be pictured as a solid of data, as shown in 
Fig. 3. 


quae aoem” Fs 




















(2) Unit inventory programme. 

The information that is retrieved in unit inventories consists of the unit itself, 
a complete identification of that unit and its source, as well as the retention of the 
auxiliary coding associated with the unit. The additional retained information makes 
the unit inventories adaptable to sorter operations, besides simplifying their use in 
other programmes, 
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(3) Ilustration. 

For the sake of brevity, we give only a single example of the practical applicability of 
the unit inventory programme. 

i. Background : the occurrence of a grammatical ending has a signal value with 
respect to its juxtaposition with a beundary of a linguistic frame. Thus, in English, 
the genitive ending ’s is more likely to signal an attribute followed by a head (e.g., 
all the king’s men, where king’s is an attribute, and men the head of that particular 
grammatical construction) than the end of a sentence (e.g., That book is my brother’s). 
If grammatical endings tend to signal final boundaries of higher linguistic frames, x, 
rather than lower, then the ratio, r, of frame final grammatical endings, g, and frame 
final bare stems, s, should vary directly with the increase in r. 

ii. Hypothesis : r co-varies directly with x. 

iii. Operational procedure : use the inventories of stems and endings, obtained from 
the basic set of data by the inventory programme. By means of a sorter operation, 
obtain the values of g and s for each frame x. Calculate r for each frame x. 


iv. Results: of an actual run for Cheremis (a highly inflected language): 


x r 
5 (word) 89 
6 (phrase) 1.24 
7 (synt. const.) 1.59 
8 (clause) 1.72 
9 (sentence) 1.76 
10 (text) 1.70 


v. Discussion: the hypothesis holds for values of x from 5 to 8, but not for 
values 9 and 10. Either the statistical tendency of grammatical endings to signal 
higher rather than lower boundaries is restricted to levels 5 through 8, or the resultant 
curve is composed of relevant sub-sets having widely different characteristics. Thus 
the curve for nominal endings might differ radically from that of verbal endings. 
At any rate, further hypotheses are clearly indicated.‘ 


APPLICATIONS AND IMPLICATIONS 


Our investigations have already generated numerous hypotheses and are expected 
to produce more as the analysis is extended and evaluated. Although some of 
these hypotheses are relevant only to Cheremis, and perhaps only to the genre 


* Mandelbrot, in a review of W. Fucks’ Mathematische analyse von Sprachelemente, Sprachstil 
und Sprache (1955), remarks “that it is very much more needed to first study the numerical 
features that are common to as many texts as possible. Of course, one never knows in 
advance whether a feature will be common or not: our feelings should be understood as an 
encouragement to students to investigate preferably thase features of language, that appear at 
first glance to be common to many texts... ,” Word, Vol. 13 (1957), p. 160. 
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under investigation, many have a wider range of applicability. From the description: 
of association matrices, studies in symbolism, and metric studies will come applications 
in cultural anthropology on the one hand, and literary theory on the other. Some 
of the logic used in our programming will assuredly have bearing on mechanical 
translation and in the cross-indexing and cataloguing of texts. Applications in other 
related fields, such as information theory, although they cannot be visualized now, are 
not ruled out. 
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Our interest in the use of computers was initially motivated by the desire to 
solve, in a few years, linguistic problems which would otherwise require many 
lifetimes ; yet we hope and expect that research of this type will increase the 
capabilities and utility of the machines. This would be a consequence not only 
of the obvious fact that the successful solution of any new type of research problem 
is inevitably accompanied by progress in the methodology, but also of the fact that, 
as we gain deeper insight into the nature and function of language, we will be able 
both to demand more of the machine and to devise better grammars for it. In other 
words, the use of extant computer techniques in linguistic analysis should eventually 
lead to the use of linguistics to increase the efficiency and further develop the 
potentialities of the electronic computer. 
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FROM ARTICULATION TESTS TOWARD FORMULAS 
FOR INTELLIGIBILITY 


THURSTON GRIGGS 
Bendix Aviation Corporation, Teterboro, New fersey 


From a project concerning radiotelephone messages in aviation, a procedure was 
developed for the formulation of hypotheses respecting more intelligible and less 
intelligible speech sounds. From an analysis of words with poor ratings in articulation 
tests, several hypotheses are made, and as those same words appear subsequently in 
articulation tests of phrases and meaningful clusters of words, a study is made of 
the speech sounds undetected and of the mistaken responses of the listeners. Those 
missing or misinterpreted elements then are sorted and classified, and are related, 
when possible, to appropriate hypotheses concerning particular speech sounds. 
New hypotheses thus are formulated as a guide for selecting the vocabulary of a 
language of stock-phrases or clichés for specialized use. 

It is suggested that the formulation can be further tested by means of specially 
designed articulation tests and possibly by means of a separate quantitative analysis 
combining intensity and duration of various speech sounds. Meanwhile the present 
formulation can be used in working toward immediate pragmatic improvement of 
recorded or stock messages; and operational tests can be conducted on the end- 
results achieved for various sets of fixed messages or for languages having specialized 
applications, 


Particularly to those who work with them, the limitations of articulation tests are 
well known. Although there is a need to improve upon articulation testing techniques 
—and until that is done the results obtained will be somewhat variable—some study 
of the results obtained with them up to this point nevertheless might permit a 
phonetician to formulate some hypotheses with respect to intelligibility. More and 
less intelligible speech elements possibly can be delineated so that vocabularies can 
be selected and messages with fixed substance or form can be engineered for greater 
intelligibility. 

The material discussed here pertains to only one limited project connected with 
radiotelephone messages used in civil aviation (Griggs and Rulon, 1953, Griggs, 
1955). In that project it was necessary to make a forced jump from articulation-test 
results to language structuring. That had to be done before a standard and orthodox 
methodology could develop, and some of the steps indeed were intuitive. Although 
articulation-test results were not fully conclusive, nevertheless an attempt was made 
to hypothecate from them a set of principles governing intelligibility. We now 
turn to the steps by which that hypothecation proceeded. 

The entire vocabulary was divided into (1) isolated words (most of them were 
either single-meaning units or grammatical-function designators), and (2) short 
phrases (clichés or oft-repeated word-groups). Those two categories naturally were 
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not mutually exclusive. The isolated words were used for articulation tests with 
various groups of listeners: foreign and native speakers of English, pilots and 
aeronautical workers, and others. The phrases, sorted according to length, also were 
given in separate articulation tests to similar listeners. 


THE ARTICULATION TESTING PROCEDURE 


Because the task assignment pertained to aviation communications, it seemed 
necessary to improvise a new methodology and to modify the procedures for articulation 
testing that were being evolved in 1951. The actual procedures employed in this 
project deviated sufficiently from what has since become standard practice, that some 
space must be taken to describe them. Since certain hypotheses regarding intelligibility 
issued from the test results themselves, and since a procedure applying to the use 
of articulation test results herewith is suggested, the methods that were employed may 
prove germane to the material which follows. 

The articulation tests were of two kinds: single words, and separate phrases. 
In each case the words or phrases were presented in three testing sequences or 
“segments”. The first of the three segments was conducted by live radio transmission 
and the last two by playbacks of recordings which had been made from radio receivers. 
The speakers in the articulation tests were American, Scottish, Australian, Colombian, 
French, Spanish and English in nationality. In each test one of the three speakers 
was a woman, 

In the first two segments the speakers were both native users of English and 
secondary users of English, and in the matters of microphone technique and signal 
strength and also in enunciation there were variables that were not controlled. The 
third segment tests were conducted only with primary users of English as speakers, 
and the recordings were first screened to ensure a fairly constant voice signal. 

In tests of isolated individual words, carrier sentences were used as follows: 

1. is the word to write. 

2. The word ————— is the next one. 

3. The next word is * 
The purpose of these variant positions of the word being tested was to compensate 
for an inevitable loss of perception on the initial syllable whenever the word being 
tested intruded into a period of silence. When each word was read three times 
with the carrier sentences in varied sequence, each word appeared once in each 
position. It was discovered later that if polysyllables and monosyllables were separated, 
the use of the carrier sentence was less important. The use of sequential numbers to 
alert the listener might have served as well as carrier sentences in the isolated word 
lists but it was not tried for want of time. 

Phrases were tested separately partly to ascertain how various words bore up 
in context, and partly ‘to assess the intelligibility of various phrases one by one in 
their entirety. In order to scale the phrases and words as completely as possible, a 
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very high level of noise interference was used. In the first two segments, static 
interference from a transformer was present in the signal so that the voice-to-signal 
ratio ranged from 40% to 60%. To eliminate that variation, a recording of white 
noise from a VHF detector was employed in the third segment and was mixed with 
the voice signal in the desired proportion. To judge from results, this produced a 
fairly effective spread of intelligible and unintelligible speech units. Words of poor 
intelligibility in tests for segments 1 and 2 showed up with definite confirmation in 
segment 3, and since the intelligibility ratings in all three segments harmonized, it 
seems likely that test conditions did not deviate appreciably throughout. 

Intentionally included in the word lists were some words which already had been 
established as sub-standard in tests conducted in other laboratories. In the first 
segment these words showed up true to form—with deviation of only 0.015 for 
those words as a group. For subsequent tests in segments 2 and 3, additional words 
which in the preceding tests had shown themselves to be unintelligible again were 
included, partly to balance the proposed intelligible words for each succeeding 
segment, and partly as a check, as just described. 

Listeners to the tests in segment 1 were native users of English. Listeners for 
segments 2 were secondary users of English for the most part, all of them pilots 
or persons engaged in flight operations and available from airports in the vicinity of 
New York City. Listeners to the third segment were taken from the International 
Air Transport Association, from the Secretariat of the International Civil Aviation 
Organization, and trainees at ICAO at Montreal. These groups listened to the phrase 
tests for which a knowledge of flight operations would prove helpful. For tests of 
words in isolation, however, foreign students studying in the vicinity of Boston, 
Massachusetts, all of them newcomers to English language zones, participated. The 
distribution of the iisteners by native language was as follows: 


American English 1 
Australian English 
British English 
Scottish English 
Ambharigna 
Austrian German 
Belgian French 
Burmese 

Chinese 
Czechoslovakian 
Danish 

Dutch 

Finnish 

French 

German 
German-French 


Nh OR Ue eB eS Ne Ne et OO CO 
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Gujerati 

Italian 

India (unspecified) 
Japanese 

Marathi 

Norwegian 

Spanish 

Swedish 

Urdu 

Venezuelan Spanish 


WWrewWNe ND QA 


There were several contaminating factors that affected the results with these 
listeners. First, there was some variation in familiarity with aviation terms and 
phraseology. Second, there was variation in their familiarity with electrosonic 
communications. Third, attention and application varied according to the subjects’ 
moods and attitudes: occasional distraction or pre-occupation interfered sometimes 
with what otherwise might have been an appropriate response to a particular word. 
Fourth, some of the listeners experimented with different listening techniques at 
different points in the tests. 

Despite such difficulties, the results proved consistent as between tests for all 
segments, even though the circumstances of testing had varied as-has been described 
here. 


RESULTS 


When the test results were scored, out of approximately 300 words used in the 
first tests (preliminary ones presented here solely for illustrative purposes), the 
following three words alone were heard correctly in every case by all listeners: 


OVER 
EVERYTHING 
TRANSMISSIONS 


The following 76 words (including the three above) were heard correctly in at least 
three-fourths of the instances: 


ANYTHING MESSAGE HELLO 
AFTERNOON RECEIVING OUTER 

UNABLE ANTENNA IDENTIFICATION 
RIGHT TRANSMITTER LISTENING 
RUNWAY AGAIN ACKNOWLEDGE 


LIKEWISE NOW NEGATIVE 





RECEIVER 
UHF 
FREQUENCY 
CLEARLY 
SOMEBODY 
COMPLETELY 
TRAFFIC 
BEFORE 
VERY 
CHARLIE 
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AT OUT 
BROKEN ALPHABET 
NECESSARY LOCALIZER 
O.K. SIGNAL 
BAKER CHANGE 
CENTER RADIO 
INTERMITTENT WILL 

CHECK AIRWAYS 
THIS THEIR (THERE) 
CONTINUOUS READY 
KILOCYCLES ALREADY 
VHF GROUND 
SLIGHTLY NORMAL 
ABLE MEGACYCLES 
TRANSMISSIONS MAINTAIN 
COMPLETE WHEN 

GOOD CORRECTION 
VOICE SECONDARY 
UNDERSTAND 

ROGER 


The following 33 words were heard between 61 and 74% of the time: 


TRANSMISSION 
SLOWLY 

UP 
TEMPORARILY 
OPERATING 
AIRPORT 
TERMINAL 
GOING 

TRY 

HIGH 

DELAY 


CLEAR RECEIVE 
STATIC POORLY 
ALSO AHEAD 
PRIMARY CHANNEL 
YOU CORRECT 
PERFECT ITEM 

OUR MONITOR 
ABOARD TRANSMIT 
STANDING OPERATE 
FREQUENCIES INNER 
UNDERSTOOD STAND BY 


The following 33 words were heard between 51 and 60% of the time: 


WE 
PLL 

UNCLE 
REQUEST 
TRANSMITTING 


SLOWER 


WING HOW 
EARPHONES MY 

AIRCRAFT MINUTE (time) 
BACK DOWN 
BACKGROUND LEVEL 

BEGIN WAS 

















SPEAK 

ILS 
INOPERATIVE 
PETER 

READ (REED) 


The following 50 words were 


The following 39 words were 


HAVE 
SEND 
APPROACH 
CONTROL 
COMING 
TALK 
WOULD 
AM 

US 
EYESTRAIN 
TURNING 
MISTAKE 
CONFIRM 


Thurston Griggs 


VERIFY 
JUST 

WEAK 
ALTIMETER 


AFFIRMATIVE 


LISTEN 
STILL 
ADVISE 
MANAGE 
ARE 


heard between 41 and 50% of the time: 


NOT 
DECIMAL 
CALLING 
LOVE 
FADING 
DOES 
CALL 
LOUD 
WILCO 
INITIAL 
DOG 
TWICE 
TOWER 
POINT 
CLEARED 
COPY 
MIDDLE 


heard from 31 to 40% of 


GOT 

TO 
BOOMING 
HAZING 
SPELL 
BLOCKED 
SAY 

LOW 
CLOSER 
INBOUND 
FURTHER 
EXCEED 
CONTACT 


FOR 

USE (noun) 
DAY 
RANGE 
GAINS 

GO 

AND 
WIND (noun) 
POSITIVE 
MAKING 
GIVE 

FOG 
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The following 29 words were heard from 21 to 30% of the time: 


LONG OFF HE 
MAY PARTLY CHAIN 
OF STATION TEST 
WHAT HIM THEY 
CUT WATCH GOES 
FADED SHORT LIGHT 
BUT YOU’RE WORDS 
DID COUNT KING 
IT FROM CAN 
LAST STRENGTH 


The following 41 words were heard from 1 to 20% of the time: 


PVE SEAL THAT’S 
GREEN LEAVE BY 

NEED THANK STEEL 
FADES BUTTON WORD 
HOLD GET ME 
SIDE-TONE COAL IN 

PART FINE YOU’VE 
WE'VE HE’S FREQ (freak) 
BASIC TARE HIS 

BEEN NAN SHUT 
WE'RE COME HEADSET 
ROG (Rahj) BREAK YET 
RAMP THE COMES 
AN RING 


Those familiar with this type of testing can assess these results by comparing 
certain particular words as they happened to rate in other familiar tests. It must be 
pointed out that since these were the first articulation tests conducted by the staff 
of this particular project, there was a need for improvement in the manner of con- 
ducting the tests, so these results were treated as only tentative. 

Before discussing the next steps let us first glance at the results in a general way. 
Most monosyllabic words rated low, as might be expected (French and Steinberg, 
1947), but not all multisyllabic words rated high. Some short words appeared in 
much better light in subsequent tests of whole phrases, whereas even some long 
words seemed to lose the high ratings of this first test when they showed up as the 
missing or mistaken elements of the phrase-tests which followed. Many seemingly 
promising longer words failed to come through when tested in whole phrases, with 
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added factors of context, elision, juncture and stress. A few short words proved 
highly intelligible: OVER, GOOD, VERY, WHEN, VOICE, ABLE, I, AT, OUT, 
RIGHT, WILL ; and that calls for an explanation. When COMPLETE, ABLE, 
AFTER, and CHANGE appear in the list of easily understood words, naturally we 
expect to find COMPLETELY, UNABLE, AFTERNOON, and CHANGING. But 
what made some shorter words so intelligible? And why are some of the longer 
words understood poorly ? In order to discover the phonetic characteristics which 
distinguish one group of words from another, it is necessary to examine each category 
to see what characteristics the words in it have in common—whether they have 
certain prevalent vowels or consonants or sequences of them—and how they contrast 
with the features of other groups of words of a different rating. 


FORMULATING TENTATIVE HYPOTHESES 


If we compare the list of 76 words that were correctly heard in three-fourths of 
their occurrences with the group of 108 words that rated 40% or less, we find one 
important factor that must be taken into account. The word-groups here compared 
differ from each other in that the first group has the greater number of syllables 
and sounds but a smaller number of words. The poorly-heard group contains 137 
syllables and 401 sounds and has 108 separate words or units of meaning, whereas 
the well-heard group, although it contains more syllables (175) and sounds (421), 
has 40% fewer words—just 76 words. Consequently the number of auditory stimuli 
heard to comprise a meaningful unit for recognition is larger in the well-heard group 
than for the other. With the poorly-heard group each single auditory stimulus is 
therefore more essential, being more closely associated with a semantic unit. Therefore 
some speech elements having unsatisfactory intelligibility might appear just as 
frequently in the one group as in the other (this is actually the case with /n/, for 
instance), yet such elements might prove decisive in placing certain words in the 
poorly-heard group. The technique of comparing the two groups phonetically therefore 
has certain limitations, but it also has some utility. 

Although the distribution of sounds in the two groups is on the whole very 
similar, there are some differences on the basis of which we may form the following 
hypotheses : 

(1) the consonant combinations /st/ or /ts/ and the consonant /h/ give rise to 
errors ; 

(2) diphthongs and long vowels carry better than short-duration vowels, particularly 
those followed by a stop. 

In the lists rating 40% or less, we find the following words containing /st/ or /ts/: 
INDISTINCT, ITS, STATION, SUGGEST, EYESTRAIN, MISTAKE, TEST, 
LAST, STRENGTH, THAT’S, and STEEL: 11 out of 137 syllables or 8%. In 
the well-heard list this combination occurs only once in 175 syllables or only 
about 0.6%. 
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Next comes /h/ which shows in the following poorer words: HAVE, HAZING, 
HE, HIM, HOLD, HE’S, HIS, HEADSET. This makes 8 syllables out of 137 or 
about 6%. In the better group it occurs twice in 175 syllables or only about 1%. 

Let us now turn to the second hypothesis: that diphthongs and long vowels carry 
better than short-duration vowels, particularly those followed by a stop. In the 
list of words heard poorly, this is exemplified by the following: GOT, TALK, 
BLOCKED, JIG, WOULD, WHAT, CUT, SHORT, BUT, DID, IT, SHUT, 
HEADSET, ROG (rahj), GET, YET, BREAK, FREQ (freak), and others having 
short-duration vowels without terminal stops. Taking this enumeration alone, for 
the present, there are 18 out of 137 syllables or 13%, whereas in the better-heard 
group only REPEAT, AT, OUT, and possibly GOOD appear to be in this category: 
at most 4 out of 175 syllables or about 2%. 

The following words also merit attention with respect to shortening of vowels or 
the presence of diphthongs. In the “ poor” list we find the following words almost 
lacking in vowels, chiefly because /m/, /n/ or /n/ or some other voiced consonant 
follows or precedes so as to absorb or contaminate the vowel: COMPLY, SEND, 
BOOMING, INCLINE, BLOCKED, AM, I’M, INBOUND, MISTAKE, CONFIRM, 
LONG, STATION, HIM, WATCH, COUNT, KING, CAN, STRENGTH, 
BUTTON, BEEN, COME, RAMP, COMES, AN, RING, GREEN, THANK, SIDE- 
TONE, FINE. Here are 35 instances out of the 137 syllables or 27%. 

The fact that certain words have been included in more than one category to 
illustrate more than one vice does not weaken the case: it is to be expected that 
the words heard most poorly will have at least one, and may have several compounded 
detrimental features. In the longer words on the “ best” list, in contrast, although 
there are many shortened vowels in the longer words, their superior intelligibility can 
be explained by the multiple-stimulus theory already mentioned above. It is the 
shorter words in that group which should be noted: OUT, O.K., NOW, AT, I, 
ABLE, GOOD, VOICE, RIGHT, WILL, BEFORE, AGAIN, THEIR, READY, 
OUTER, HELLO. There are three patterns present: (1) firm, clear diphthongs as 
in RIGHT, OUT, NOW, O.K., OUTER, VOICE, ABLE, I ; (2) BEFORE, AGAIN 
and HELLO have accents on the second syllable and may be heard well because the 
listener’s acuity has been alerted by the first syllable ; and (3) READY and OUTER 
which have two syllables each, with one syllable which can itself stand as an 
independent word. 

It must be noted that these explanations cannot be reversed in such a way that 
we find on the poorly-heard list no words at all with these characteristics ; for with 
the poorly-heard words other phonetic features contaminate and otherwise vitiate 
the merits both of diphthongs and sometimes even of multiple syllables and stress 
patterns. 
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These first hypotheses respecting less intelligible speech sounds, important though 
they are, do not tell the whole story because as we saw in the case of /n/, some speech 
sounds that are unintelligible in certain contexts seem to be heard adequately in 
others. Can the first hypotheses be supplemented by others as a result of studying 
whole phrases or sentences ? And do they survive such further analysis ? In turning 
to articulation testing of whole phrases, we find that a new and different use can be 
made of the results of such tests. But first, here are some sample results from them: 


Phrases Heard by all Listeners 


YOUR COMPLETE IDENTIFICATION READ YOU POORLY 


DO YOU READ DID YOU READ 

YOUR IDENTIFICATION DO YOU STILL READ 

SAY AGAIN YOU ARE INTERFERING 

NOT ABLE TO READ NOT ABLE TO UNDERSTAND 
STATION CALLING GIVE ME A SHORT COUNT 
LOUD AND CLEAR OPERATING ON THAT 

SAY YOUR MESSAGE NOW FREQUENCY 
DID YOU CALL CHANGE TO THAT FREQUENCY 
HOLD YOUR MESSAGE READ BACK EVERYTHING 

THAT IS CORRECT BEFORE THAT 
HOW DO YOU READ ME YOU’RE NOT COMING IN AT ALL 
IS THAT CORRECT DID YOU GET THAT 

UNABLE TO READ WILL YOU REPEAT THAT PLEASE 
TRANSMISSION INTERMITTENT WILL YOU SAY YOUR LAST 
HOW DO YOU HEAR ME MESSAGE AGAIN 
TRANSMISSION NOISY I REPEAT 

IS THAT RIGHT THAT’S CORRECT 


THAT’S RIGHT 


It will be noted that many words that rated low individually were heard successfully 
in the phrases. Thus the intelligibility of these phrases and of their constituent words 
when in that context seems to be established—at least from an operational point 
of view. 

How can the probability of achieving intelligible phrases be established without 
running series of phrase-tests in each case? How can such tests be used only 
initially to establish certain principles or formulae governing intelligibility ? It is 
here that the less intelligible phrases come into their own right, for in the analysis 
of poorly understood phrases the most useful data regarding intelligibility becomes 
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available ; here we find out those words- that fail to come through to a listener and 
those that are misinterpreted. It is not the correctly heard phrases that are so 
important in this instance, but rather it is the errors: analysing them can prove 
quite fruitful ; and from that analysis we are able not only to strength our earlier 
hypotheses, but to add to them. By noting which words are missing from the 
listeners’ transcriptions and what sounds are interpolated or misapprehended, we 
can discover something about the speech sounds which carry and those which are 
most readily mistaken for others. This is the sort of information that is not available 
from conventional articulation tests performed on separate isolated words. Here are 
some examples of poor test scores with comments interpolated: 


Original Version Some Versions Heard 


HOLD YOUR MESSAGE A MINUTE —— YOUR MESSAGE MISSED IT 
— YOUR MESSAGE —— 
(Comment: hold is missing; /n/ in 
minute) 

STAND BY — BY AM BY — 
(Comment: /st/ is missing, /d/, 
possibly because of /n/, also is missing) 

READ BACK — BACK FEEDBACK 
READ AT —— 

(read is deficient here; /ak/ proves weak) 

ALONE EAT SLOWLY 

(/sp/ ; /ik/ ; /m/ ; /or/ ; “ alone ” has 

the vowel of slowly) 





SPEAK MORE SLOWLY 





TRANSMISSION BROKEN UP MISSION BROKE UP 

(/tr/ ; /n/ ; /z/) 
RANGE CLOSED CLOSED 

(range, probably because of /nd3/) 
TRANSMISSION FADES IN AND OUT TRANSMISSION —— OUT 


(/£/;  short-duration vowel; /dz/ 
followed by another shortened vowel and 
/n/ ; but trans comes through) 
TRANSMISSION COMES AND GOES TRANSMISSION TO AND FRO 
(comes and is missing; /g/, /z/ are 
weak ; “to and” is a semantic inter 
polation ; “ fro” shows the vowel sound 
of goes) 
GIVE ,.ME A LONG COUNT GIVE A WALK OUT 
GIVE ME A LONG — 
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CAN YOU STILL READ ME 


CALL AGAIN CLOSER 


OPERATIVE FROM NOW TILL THEN 


WHEN WILL IT OPERATE 


CHANGING TO THAT FREQUENCY 
GIVE ME A CHECK ON THIS 
FREQUENCY 


CHECK MY TRANSMISSION ON 
THIS FREQUENCY 


YOU ARE INTERFERING ON THAT 
FREQUENCY 


LEAVING TOWER FREQUENCY 


PLEASE SPELL WORD AFTER THAT 


A MESSAGE FOR HIM FROM THEM 


READ YOU INDISTINCT 


YOU FADED OUT ON US 


(long ; “ walk ” may be an interpolation ; 
/k/ and /n/ of count fail, but the vowel 
and stop remain) 


CAN YOU —— READ ME 

(still has /st/, short-duration vowel, and 
/\/) 

CALL AGAIN GOES —— 

ALL AGAIN 

(/k/ weak ; /kl/ heard as /g/ ; -/r/ is 
lost) 


OPERATOR NOW AND THEN 
(unstressed from and till not heard ; 
-ive not heard) 

GLENN WITT OPERATE 

(/hw/ and /n/ and /l/ missing ; main 
vowels all heard) 

CHANGING TWO —— FREQUENCY 


(that) 

GIVE ME —— ON THE 
FREQUENCY 

(a check; /t§/, short-duration vowel, 
and /k/) 


LET MY TRANSMISSION —— 
THIS FREQUENCY 

(check again; /t/ for /k/; on has 
weak stress and /n/) 

YOU ARE HEARING THAT 
FREQUENCY 

(interf- contains /nt/ and -/r/ and /f/) 
LEAVE OUR FREQUENCY 

(-ing after /v/ ; /t/ is lost) 

TELL AFTER THAT 

(please due to /1/? ; /sp/ is missing ; 
-/r/ affects word ?) 

MESSAGE FOR —— 

(/h/, /m/, /fr/, /3/ all are lost) 
READ YOUR —— 

(indistinct: two /n/’s, /st/ and a short- 
duration vowel) 

PHASE OUT OF —— 

(you unstressed before faded ; -ded of 
faded is lost ; /n/ and /s/ missing ; the 
vowel of “ of ” is from on) 
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SWITCH OVER TO BAKER CHANNEL WHICH OVER —— TAKE A —— 
(/s/ is lost ; -/r/ of baker is lost; /t/ 
for /b/ may be semantically prompted) 

GIVE ME JUST A SHORT CALL GIVE ME —— SHORT HAUL 
GIVE ME BUT —— 

(call contains /k/ and /1/ ; /k/ replaced 
by /h/ ; /1/ missing ; vowel heard once) 


li can be seen that this type of test makes it possible to “‘doctor-up” the 
phoneticaily deficient portions of each phrase by substituting more intelligible 
synonyms or by reconstructing the phrases. The following words were missed at 
least once: HOLD, MINUTE, STAND, READ, SPEAK, MORE, RANGE, FADES, 
IN, AND, COMES, GOES, LONG, COUNT, STILL, CLOSER, FROM, WHEN, 
THAT, THIS, CHECK, ON, PLEASE, SPELL, WORD, HIM, THEM, 
INDISTINCT, FADED, US, CHANNEL, JUST, CALL, A, THE. In addition 
we must note the following errors, for they carry even more of the information 
we need about weak speech elements: “ missed it ” for MINUTE, “am” for STAND, 
“eat” for SPEAK, “ broke ” for BROKEN, “fro” for GOES, “ walk” for LONG, 
“out” for COUNT, “operator” for OPERATIVE, “ glenn” for WHEN, “ witt” 
for WILL IT, “let” for CHECK, “hearing” for INTERFERING, “our” for 
TOWER, “tell ” for SPELL, “ phase ” for FADED, “ which ” for SWITCH, “ haul ” 
for CALL. It can be noted that the vowels suffer less distortion than the consonants. 

Noting errors and omissions in this manner, it was possible to build up a useful 
list of weak words and then to study it for features common to the entire list or to 
certain combinations of sounds; categories of words then can illustrate certain prin- 
ciples regarding intelligibility. When we take these words, together with much larger 
lists of words missing from phrases or misunderstood in phrases, and study how they 
become transmuted when they are misunderstood, we can formulate certain hypo- 
theses in addition to those formed on the basis of the single word‘ tests. The 
following sounds give rise te trouble in the reception of messages: 


(a) /n/ 
MINUTE, STAND, RANGE, IN, AND, COUNT, WHEN, ON, INDISTINCT, 
CHANNELS, BROKEN 
Here we have 11 out of 28 words that are weak. 
(b) /r/ and /1/ 
HOLD, MORE, CLOSER, STILL, SPELL, WORD, INTERFERING, TOWER 
Here are 8 out of 28 words that are weak. 
(c) /k/ and /g/ 
COMES, GOES, COUNT, CLOSER, CALL 
(d) /f/, /s/, and /5/ 
FADES, CLOSER, THAT, THIS, THEM, FADED, US, THE, FROM 


- OO  - - 
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(e) Unvoiced elements tend to be less distinct one from another than from voiced 
elements : 

BACK and AT, SPEAK and EAT, CHECK and LET 

Thus far, by using just a restricted word list and a limited vocabulary from a very 
few articulation tests, we have simply illustrated the method that can be followed 
in deducing certain hypotheses regarding intelligibility. By far the most useful step 
is the analysis of errors made by listeners when hearing phrases, because each error, 
provided it is not a semantically-prompted reconstruction from mere fragments of 
auditory clues, pin-points the phonetically weak speech elements and those with which 
they are most likely to be confused. Even some reconstructions show the missing or weak 
elements. An additional benefit from working with phrases is the fact that the effects 
of juncture, elision, stress, and rhythm patterns of normal speech all are present in 
the usual, natural way. 


THE SECOND SET OF HYPOTHESES 


Here is a summary of the hypotheses that were derived respecting the phonetic 
features of intelligibility. These naturally have resulted from a much more extensive 
study than has been illustrated here. 

First there are two corollaries: 

(1) Generally speaking, monosyllabic words are less intelligible than multi-syllabic 
words ; but this is not true in the case of long words having weak sounds sharply 
stressed—for example, “ component”. Monosyllabic words which have poor phonetic 
components rate the lowest of all speech units as far as intelligibility is concerned and 
they afford the greatest likelihood for mistaken comprehension or for contamination 
of other elements in phrases. But the division of words into syllables itself is a 
procedure that tends to follow orthography and spelling rules rather than the real 
phonetic characteristics of words. Moreover, this rule about monosyllabic words 
is treacherous on still another count ; it must be tempered with the knowledge that 
some multisyllabic words containing indistinct phonetic elements actually have a lower 
intelligibility than many monosyllabic words, cf., component and by. Apart from 
that, addition of syllables to a root word in articulation tests brought, with the addition 
of each syllable, approximately 5 per cent improvement to the intelligibility rating 
of the root word ; e.g., right, righteous, righteousness. 

(2) Despite the impression a speaker has that consonants make hearing easy 
because of the muscular effort involved—which can be perceived as a concomitant 
of speech—intelligibility is not closely correlated with muscular effort on the part 
of the speaker. Those clusters of consonants which it seems should be distinct in 
interpersonal speech because of their muscular requirements, fail to transmit clearly 
above noise on an electric circuit ; e.g., /mpt/, as in pre-empted, /nts/ as in elements, 
/str/ as in destroy, /kts/ as in acts, and /kstr/, as in foxtrot. It is interesting to 
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note the correspondence of these findings with recommendations made for the 
construction of an international language in 1939 by N. S. Trubetskoy. He suggested 
among other things that no word should start with two consonants. 

Next there are ten hypotheses regarding intelligibility of specific speech sounds: 

(1) /n/ does not carry above noise and it has the effect of dissipating vowel sounds 
through the nose rather than through the mouth, thus sullying or weakening them. 
The reader might ask a Mr. Noonan—if he knows one—whether he has trouble giving 
his name over the telephone. /m/ is only slightly better, cf. come sooner, consumer. 
/n/ is passable because it occurs more limitedly and never initially. 

(2) /r/ and /1/ really constitute semi-vowels in the speech of many persons. 
The pronunciation of both of these elements also differs markedly in various parts 
of the world, and /r/ ranges from retroflex or uvular to tongue-trilled, and it is 
often completely omitted after a vowel ; but it more often changes the vowel values 
itself ; e.g. wore as “ woah”, “ woe”, “ wuh” or “ wah”. 

(3) /f/, /s/, /8/ /S/ tend to be indistinct against noise in electrosonic trans- 
mission ; e.g. face, phase, safe, shave, thin, sin, shin, shift and surface. 

(4) /h/ not only does not carry well among secondary users of English, but it 
tends to be pronounced by them as other sounds, predominantly /x/ which is heard 
as /k/ and leads to mistaken comprehension. Even for English users, /h/ is a weak 
sound against noise and through electric apparatus ; its principal effect is to space 
other sounds—a time-function, like /m/ and /n/. Partly because of /h/ hold was 
heard as call not just once, but as many as 21 times in a single test with a group of 
European pilots. 

(5) /st/ tends to be weak or to get lost in electrosonic transmission: straight, 
heard as rate and eight: /st/ makes for awkward juncture too: e.g. first stage, first aid. 

(6) Words or syllables in which /r/ follows another consonant tend to be indistinct 
partly because of vowel contaminations: e.g. true, drew. 

(7) The distinction between voiced and unvoiced speech elements remains passable 
against background noise throughout: /b/ is as distinct from /p/ as is /t/; e.g., 
tail and pail are less distinct from each other than either of them is from bail, except 
that /k/ and /g/ do not contrast well against noise, perhaps partly because as velar 
stops they tend to influence or be influenced by conformations of the oral cavity 
required for adjacent vowel sounds formed between the uvula and the lips; e.g., 
card, guard, crew, grew, cane, gain, cage, gauge, leg, lake. 

(8) Most of the stops (/p/, /t/, /k/) tend to sound alike against noise. But they 
contrast well with the vocalic elements ; consequently, they become obscure only 
when there occurs a succession of monosyllables employing stops or when a critical 
word resembling another in both sound and potential context might be involved: 
e.g., cut, cup, cf. jut. 

(9) The decisive criterion for intelligibility of syllables is associated with the 
attenuation of vowel sounds. When vowel sounds are shortened either by being 
surrounded by stops as in the word beet, or by a stress placed on a subsequent syllable 
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—for example, the weakening of the first “e” in descend—intelligibility is decreased 
markedly. 

(10) Diphthongs often seem the clearest of vowel elements, probably because of 
their combined vowel length. Combinations of semi-vowels with vowels come next ; 
cf., boy, I, you. 

Finally, there is a word of caution regarding the application of these hypotheses 
to language structuring: 

Phonetic principles such as the foregoing apply to isolated individual words and 
they are not necessarily valid when those words are used in different phrases. 
Sounds which may be distinct when used initially or when used finally may be 
rendered indistinct through close juncture with other words in a phrase. For example, 
up when added to climb brings out the /m/ “clime-up”, but when added to 
gusts has the effect of prolonging the /s/ to the stop /p/—*“ gussup to...”, which 
then. becomes so staccato as to be indistinct; consequently gusts reaching is better 
than gusts up to (assuming that the word gusts cannot be replaced easily). Because 
of their greater length moreover, phrases afford more contextual and meaning-bearing 
association than do words, and that enhances listeners’ perception, making intelligible 
some weak sounds and syllables which otherwise would become lost. 


DISCUSSION OF FUTURE APPLICATION OF THESE FINDINGS 


The suggested methodology can be used both for testing these hypotheses and for 
deriving others. This method involves first, articulation tests by phrases on a large 
enough scale to make possible an extensive study of errors ; and secondly, verification 
of the hypotheses by articulation-tests of selected special lists. These should be lists 
of words balanced phonetically and equalized in length except with respect to the 
particular phonetic features under investigation. Phrases, preferably short ones, also 
should be tested. 

A separate investigation of the relative durations and intensities of various sounds 
also might prove fruitful. If a scale of measurement were devised by means of which 
the values could be established for various vowels, diphthongs, and semi-vowels ; 
and if furthermore the duration of less distinct sonant elements among the consonants 
in contrast to the more easily heard speech elements were measured (and all these 
measurements would have to be normative for the speech or special language chosen 
as a base), then criteria might become available for more precise and more accurate 
structuring of language for intelligibility. Although such a scale could be applied 
to the separate words of any given vocabulary, its main ultimate utility would lie 
in its application to the direct analysis and construction of sentences and phrases 
of speech, which after all is the main objective of intelligibility investigations. 

A more immediately expedient approach might be to accept these or similar 
hypotheses merely as working hypotheses and use them for the construction of 
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languages of specialized nature that will prove more intelligible than normal speech, 
or for the formulation of clichés used in recorded messages. In such instances, the 
end-products in some cases might be tested operationally or functionally without 
waiting for slower and surer step-by-step verifications of each hypothesis. 


CONCLUSION 


In proceeding from articulation testing toward formulas for intelligibility we suggest 
a return to separate speech sounds as the basic step. First we must analyze the 
results of various articulation tests with respect to the constituent speech sounds 
found in each—the number of times each appears and is heard, and the relative 
positions of each speech sound in each word that is heard or not heard. That is only 
a start. Next we must supplement those findings and modify them by studying the 
results of phrase-tests, noting particularly the speech sounds which are lost or become 
replaced by others. The second step is the crucial one. Though less satisfying 
statistically, it is far more fruitful in showing the behaviour of speech sounds in 
clusters and in various sequences. The hypotheses that result from such methods 
can be used to structure stock-phrases or languages having special vocabularies, with 
resulting improvement in intelligibility. 
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THE READING OF MESSAGES OF DIFFERENT TYPES AND 
NUMBERS OF SYLLABLES UNDER CONDITIONS OF 
DELAYED SIDE-TONE 


Joun W. BLack 
The Ohio State University, Columbus 


Eight sets of phrases were constructed for oral reading. Two sets were comprised 
of five syllables, in one instance consonant-vowel syllables and in the other, vowel- 
consonant. Three sets were made to include only syllables of two, three, or four 
sounds and a total of either 15 or 16 sounds. Three additional sets were the same 
as the preceding ones except that the total number of sounds was either 20 or 21. 

Each group of twenty-four young male adults read a set of the phrases under 12 
conditions of delayed side-tone ranging from zero delay to 0.30 sec, delay. The 
duration of the oral phrase was measured. 

The reading of phrases of vowel-consonant syllables was more adversely affected 
by delayed side-tone than was the reading of consonant-vowel syllables, and the 
disparity between the two increased as the amount of delay of side-tone was increased 
to 0.21 sec. Otherwise syllables and phrases of different lengths were responded to 
“alike,” there being no interaction between the amount of delay of side-tone and 
the structure of the phrase. 


Side-tone, the auditory experience of one’s own talking, that is delayed within the 
range from 0.03 to 0.30 sec. typically retards the rate of oral reading among adults. 
A description of the effect usually makes reference to the monitoring system and 
posits a close relationship between one’s monitoring of his speech and his progress 
in oral reading. The unit that is monitored is elusive ; perhaps, an interval of time 
that is quantized by the monitoring system or perhaps a unit of language. 

The writer’s work with delayed side-tone has focused upon the phoneme and the 
syllable as units that may be pivotal in the speaker’s evaluations of his speech. 
Stimulus phrases of limited numbers of syllables, usually five, have provided convenient 
material for reading aloud. These messages permit the study of the effects occasioned 
by delayed side-tone during “on-going speech,” ie. within a span that is not 
confounded with periods of inhalation. Fortuitously, the duration of the phrase 
provides a convenient physical measure, one that is amenable to statistical processes 
and that allows comparisons readily. 

The present study was planned to explore the relative influence of delayed side-tone 
upon rate of reading when forms of the stimulus material were systematically 
differentiated. Specifically, phrases were written (a) that differed in the form and 
length of syllables, (b) that differed in the number of syllables, and (c) that differed 
in the number of phonemes. 


PROCEDURE 


Eight sets of stimulus materials were constructed, each consisting of five phrases: 
(1) five consonant-vowel (CV) syllables, e.g., “today by the sea”; (2) five vowel- 
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consonant syllables, e.g., ‘an aid in earache”; (3) separate sets of two-, three-, 
and four-sound syllables totalling 15-16 phonemes ; and (4) three sets, the same as 
the foregoing but totalling 20-21 phonemes. 

Twenty-four male university students read each list, 192 students serving as 
experimental subjects. 

Each subject participated individually. He read an assigned list of five phrases 
aloud until he said that he was completely familiar with it ; he then read the list 
under 12 conditions of side-tone, once at the outset with no delay and once at each 
delay time 0.00, 0.03 . . . 0.30 sec. The sequence of the latter 11 conditions 
was randomized. 

The Audio Signal Delay Unit, (Marple, 1952) fed by a condenser microphone 
that was suspended from the headset and positioned at the corner of the speaker’s 
mouth, delivered his speech to his ear at approximately 95 db re 0.0002 dyne/cm’, 
this level varying upward and downward with the level of the input. The reading 
was recorded both on a tape recorder and a power level recorder. The measure that 
entered into the statistical analysis was the mean duration of the five phrases within 
a set as read at one condition of delayed side-tone. 

The measures were treated by an analysis of variance in the manner designated 
by Lindquist (1953), Type I (mixed design), essentially a series of replications of a 
subjects x treatments plan. In all of the analyses in the present study the columns 
represented 11 amounts of delay, 0.00, 0.03 . . . 0.30 sec., and the replications, 
categories of phrases that were being compared. The statistical analysis provided 
three focal points of interest in the present application ; (a) a possible difference in 
mean reading time from one experimental condition to another, (b) a possible 
difference in mean reading time from one category of phrases to another, and (c) 
an interaction between these two, i.e. a greater difference between the mean times 
required for saying particular categories of phrases at some delay-times than at other 
delay-times. Of the three possible outcomes, a consistent temporal difference between 
categories of syllables would be of some interest, somewhat akin to ranking syllables 
in ease of being said ; a difference between delay-times would be expected. The 
crux of the statistical procedure was that it provided for detecting any systematic 
interaction between delay-times and categories of syllables. 


RESULTS 


The analysis that treated the two orders of the consonant and the vowel in two- 
sound syllables, is summarized in Table 1. The significant F-ratio attached to the 
B (groups) factor of the analysis indicates that the mean rates with which Set 1 and 
Set 2 of the 10-phoneme phrases were read differed. Of particular interest, the significant 
interaction-value indicates that the effects of the successive increments of side-tone 
delay on the rate with which the CV and VC syllables were read were not “ parallel.” 

The two sets of “non-parallel” mean values, the mean durations of CV and VC 
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Tabu i 


Degrees of Sum of 


Source of Variation Freedom Squares Variance F 
Between subjects (47) (131.73) 
Categories of phrases (B) 1 20.93 20.93 8.69 
Error (b) 40 110.80 2.41 
Within subjects (480) (72.03) 
Delay times (A) 10 25.83 2.58 28.67 
AB 10 4.68 0.47 5.22 
Error (w) 460 41.52 0.09 


Summary of an analysis of variance treating CV and VC five-syllable phrases. 


phrases when read under different conditions of delay of the side-tone, are plotted 
in Figure 1 and are included in Table 4, a summary of the mean values of the 
different sets of phrases. The duration of VC phrases increased more markedly under 
all experimental conditions of delay than did CV phrases and this disparity increased 
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Fig. 1. Duration of five-syllable CV and VC phrases under 11 temporal conditions of side-tone. 
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with the longer times of delay, at least through delays of 0.21 - 9.24 sec.’ 

The responses to the three sets of phrases that contained 15-16 phonemes per phrase 
were treated statistically in the manner described above. The analysis of variance is 
summarized in Table 2 (a), the three sets of phrases, labelled Set 3, Set 4, and Set 5 
in Table 4, were comprised of eight 2-sound syllables, five 3-sound syllables, and 
four 4-sound syllables respectively. 

Table 2 (b), summarizes a third analysis of variance, the same as the preceding 
ones, and treating the same three categories of syllables that were treated in the 
preceding analysis, but this time totalling 20-21 sounds per phrase. 

The three foregoing analyses treated groups of phrases that within a group were 
similar in numbers of constituent sounds and dissimilar in the form of the member 
syllables. The three sets of 15-16 phonemes varied in duration in keeping with (a) 
the delay-time of the side-tone and (b) the form of the syllables ; there was no inter- 
action between these two variables. The analysis of sets of phrases of 20-21 phonemes 
yielded the same outcome. 

The data provided for a further test, whether or not there was a systematic effect 
on the oral reading time as a consequence of the differences in the lengths of phrases 
that were employed in this study. Interactions between the effects of delay-time and 
of length of message upon the rate of oral reading have been reported (Spilka, 1954), 
in one instance between phrases of 10 and of 20 phonemes (Black 1955). 


TABLE 2 
(a) 2-, 3-, and 4- (b) 2-, 3-, and 4- 
sound _ syllables ; sound syllables ; 
15-16 phonemes 20-21 phonemes 
per phrase per phrase 
Source of Variation Degrees of Variance F Variance F 
freedom 
Between subjects (71) 
Categories of phrases (B) 2 9.67 3.05 39.85 8.19 
Error (b) 69 3.17 4.86 
Within subjects (720) 
Delay times (A) 10 4.55 28.43 7.16 37.68 
AB 20 0.15 0.98 0.29 1.52 
Error (w) 690 0.16 0.19 


Summaries of two analyses «f variance, each treating phrases that were similar in the number 
of phonemes and dissimilar in the form of the syllable. 


1 The experimental procedure resulted in two values for zero delay, one obtained from the 
first experimental reading and the other, from the zero-delay condition that was one of 11 
experimental randomly ordered conditions. In the analyses of variance the mean of these 
two values was entered in the matrix. Table 4 includes the two means separately. The two 
columns differ at the five per cent level of confidence. 





—E —— 


oo 





ee 





john W. Black 215 


Summaries of three analyses of variance are grouped in Table 3. In each instance 
the expected results appear, length of phrase and delay-time introducing statistically 
significant differences in oral reading time. Equally consistently, there was no inter- 
action between the two. 


TaBLe 3 
(a) 2-sound b) 3-sound (c) 4-sound 
syllables; 16and syllables;15 and syllables; 16 and 
20 phonemes 21 #£=phonemes 20 #£4phonemes 
per phrase per phrase per phrase 
Source of Variation Degrees of, Variance F Variance F Variance F 
freedom 
Between subjects (47) 
Length of phrases (B) 1 81.28 17.32 85.68 16.87 9.70 4.27 
Error (b) 46 4.69 5.08 2.27 
Within subjects (480) 
Delay times (A) 10 2.94 26.73 6.00 20.68 3.05 23.46 
AB 10 0.18 1.64 0.28 0.96 0.15 1.16 
Error (w) 460 0.11 0.29 0.13 


Summaries of three analyses of variance, each treating phrases that were alike in the number 
of phonemes in a syllable but dissimilar in the number of syllables in a phrase, 


TABLE 4 


Type of Phrase 
Set No. 0 
Five VC syllables 
Five CV syllables 


Amount of Delay 
0 0.03 0.06 0.09 0.12 0.15 0.18 0.21 0.24 0.27 0.30 sec. 
1.55 1.51 1.65 1.87 2.01 2.23 2.32 2.28 2.47 2.45 2.33 2.17 
1.43 1.35 1.55 1.68 1.66 1.77 1.91 1.91 1.78 1.76 1.79 1.75 
Eight 2-sound syllables 1.84 1.70 1.91 2.17 2.30 2.42 2.62 2.62 2.58 2.64 2.61 2.45 
Four 4-sound syllables 1.57 1.58 1.67 1.83 1.97 2.00 2.12 2.21 2.36 2.19 2.07 1.96 
Five 3-sound syllables 1.53 1.64 1.75 2.00 2.16 2.36 2.33 2.22 2.31 2.28 2.26 2.14 
Seven 3-sound syllables 1.92 1.88 2.00 2.21 2.45 2.57 2.56 2.68 2.64 2.70 2.38 2.36 
Five 4-sound syllables 2.12 2.24 2.35 2.56 2.75 2.89 2.91 2.99 2.98 2.93 3.01 2.97 
Ten 2-sound syllables 2.46 2.35 2.52 2.92 3.19 3.35 3.48 3.50 3.65 3.90 3.27 3.32 


onaut Wr 


The mean duration of a phrase in each of eight sets of phrases under twelve experimental 
conditions. Five phrases per set. N readers, 24. 


DISCUSSION AND SUMMARY 


This study treats only reading time as a criterion measure. 

The retardation in the rate of oral reading that is occasioned by delayed side-tone 
is increasingly greater as longer delays are introduced into the speaker’s side-tone 
circuit, up to approximately 0.21 sec. The effect is greater when the reading materials 
are VC syllables than when they are CV syllables, and the disparity becomes more 
prouvunced with the longer delay times. 
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RETARDATION IN READING (SEC.) 








SIDE-TONE DELAY TIME (SEC.) 


Fig. 2. Mean retardation in reading short phrases as a function of side-tone delay. 


The toregoing instance was unique in this study in that the order of sounds was 
controlled. With the reading matter controlled only with regard to the length of the 
syllable, neither phrases of 15-16 phonemes nor ones of 20-21 phonemes resulted in 
non-parallel trend-lines among the mean durations of phrases that were read while 
the reader experienced several successive amounts of delay of side-tone. 

The foregoing result contributed to the feasibility of plotting all of the data ot 
the present study except the comparison of CV and VC materials on a single graph. 
Figure 2 shows the mean amount of retardation per phrase introduced by each of 
the 10 delay times. 
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The present results are in some measure inconsistent with a notion that the syllable 
is a crucial unit in the monitoring of speech while reading under conditions of 
delayed side-tone. This observation is based on the following line of thought. 
Syllables of more sounds require more time to say than do syllables of fewer sounds. 
If the syllable were the basic unit in monitoring one’s speech the amount of side-tone 
delay to produce the most deleterious effects in reading would relate to the duration 
of the syllable. The parallel trends, i.e., the absence of interaction, in the data of 
Tables 2 and 3 do not support this effect. Contrariwise, the syllable in one special 
instance does alter the effect of delayed side-tone, viz. the instance of CV vs. VC 
syllables. 
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VERBAL DYSFUNCTION IN MENTAL ILLNESS: 
A COMPARATIVE STUDY.* 


J. C. RAVEN 
The Crichton Royal, Dumfries 


The unusual ways in which patients explain and use words throw light on the 
different forms of thought disorder noticed in structural and functional mental ill- 
nesses. The frequency of unusual qualities can be an aid to differential diagnosis, 
and sometimes prognosis. 


INTRODUCTION 


Language bridges the gap between observable behaviour and personal experience. 
For the psychologist, a critical study of the ways in which people use and explain 
words is therefore the key which opens up the whole question of thinking and thought 
disorder to systematic comparative study under controlled conditions — the essence 
of scientific procedure. 

Psychologists would agree with Wechsler (1939) that “ in defining a word, a subject 
gives us more than its mere meaning. In many instances he tells us a good deal about 
himself, or at least about the quality and character of his thought processes. This,” 
he says, “ is particularly true in the case of schizophrenics, the formal aspect of whose 
language disturbance is frequently diagnostic.” 

Unfortunately Wechsler was not able to say what a person’s definitions of words 
told us about his thought processes. Feifel (1950) distinguished five classes of 
Vocabulary Test responses, but Moran and Blake (1952) found that Feifel’s qualitative 
categories showed no differences between schizophrenic patients and normal adults, 
in spite of the fact that experienced clinical psychologists could differentiate between 
them with more than chance frequency. 

As early as 1930, Babcock used a comparison between a person’s vocabulary test 
results and the results of tests involving speed of mental work as a means of assessing 
reduced intellectual efficiency. By 1939, Simmins, Davidson and others had found 
that schizophrenic and even manic depressive patients showed statistically significant 
discrepancies between their vocabulary and their performances on other mental tests. 

By 1957 Orme had demonstrated that “while elderly depressives appear to be 
functioning like the normal aged, the seniles show a highly signficant disruption of the 
ability to recall even the meanings of words used in every-day speech ”. His conclusion 
is that “ the decline of verbal ability may prove to be a more direct measure of this 
disruption than disturbances of intellectual functioning.” 


* Script of a paper read at the Glasgow Meeting of the British Association, September 1958. 
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PRINCIPLES OF ENQUIRY 


Those who criticise a Vocabulary Test on the ground that in comparing the ways 
in which people use words, the judgements made are somehow too “ subjective ” 
to be scientific, should bear in mind that Vocabulary Tests show high consis- 
tency, reliability and agreement between markers under conditions in which apparently 
more “ objective” tests sometimes show deplorably low consistency, reliability, or 
even agreement between investigators. 

The problem is that a word’s meaning varies according to the “context” or 
“ field ” in which it is used. Fig. 1 illustrates the lines along which verbal communi- 
cation between people normally takes place. Whether we call this “ dynamic inter- 
pretation” or “ hypothetico-deductive reasoning”, the important thing to keep in 
mind is that language is a form of communication between two or more people in 
which a word’s use and interpretation depend upon its context. 


Ae 
Oren 





Fig. 1. Normal field co-ordinates in using words. 


Fig. 2 shows the ways in which past misadventures and inadequate objectives, as 
well as present conditions of mental stress or vulnerability, may each give rise to 
unusual qualities in a person’s use of words. 


Past misadventures influencing unusual explanations or uses of words can be of 

three kinds :— 

(1) A person may have missed the opportunity to become familiar with a word, from 
Msufficient contact with people speaking the language, or from poor education. 

(2) For one reason or another a person may not have been interested in social con- 
tacts, or at least in verbal forms of communication. 





220 Verbal Dysfunction in Mental Illness 


PRESENT 
Riciorry. on_Mosurry oF bess 


Demanos on Artention 





Fig. 2. The field of thought in which words cease to be used as other people use them. 


(3) He may have suffered an injury or illness affecting his recall of knowledge or 
his present ordering of words. 


Inadequate objectives can influence a person’s explanation and use of words in 
at least two ways :— 

(1) He may have little or no objective directing his actions. In this case his res- 
ponse may be repetitive, irrelevant, or vaguely circumstantial. 

(2) He may be confused between two or more objectives, or his intentions in using 
words may be abnormal. In this case words may be given “double meanings ” 
or acquire personal significance, with the result that their explanation tends to be 
oblique, or “ past the point ”. 


In the field of contemporary events, failure to understand and use words as other 

people use them can arise in three ways :— 

(1) It may arise from rigidity, or excessive mobility of ideas. 

(2) It may equally well arise from exhaustion, or from excessive demands on 
attention. 

(3) Failure to understand and use words like other people may arise from unusual 
values, the feelings they awaken, and the ideas they are used to communicate or 
conceal. 


AGREEMENT BETWEEN INVESTIGATORS 


As the result of selecting 17 words from the standard Mill Hill Vocabulary Scale, 
it has been possible to compare patients’ responses to them with the responses given 
by normally healthy people. Apart from deafness, speech defects or aphasias, 17 
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UNUSUAL RESPONSES (“U”) 





Structural Deviations from Accepted Phraseology Functional Anomalies 
1. STYLIZED LANGUAGE (S) 4. PRESERVATION OF IDEAS (P) 
2. DISORDERED SYNTAX (D-syn) 5. RIGIDITY OF EXPRESSION (R) 
3. NONSENSE WORDS (N) 6. ONE-WAY CHAIN ASSOCIATIONS (9 
7. CONFLUENCE, or “ TELESCOPED IDEAS" (T) 

















8. STRUCTURALLY VAGUE (Vg) 13. BIZARRE CONTENT (Bz) 

4 VERTY O 10. CIRCUMSTANTIAL 14. PSYCHOLOGICAL 15. GEOGRAPHICAL 
EXPRESSION , TALK INTRUSIONS INTRUSIONS 
(Pov) (Cir) (D-p.i.) (D-g.i) 

16. NEGATIVISTIC RESPONSES 
21. ECHO RESPONSES 12. DISSIPATED (Neg) 
(Echo) RESPONSES 
(Dis) 





17. ERRONEOUS PRONUNCIATION OR 
CHOICE OF A WORD 


Deafness, 
Speech Defects, 


Aphasias 
Fig. 3. Unusual qualities in responses to the short form of the M.H.V. Scale. 


categories of structural deviations or functional anomalies, shown in Fig. 3 have been 
distinguished. 

The obvious question is: How far do markers independently detect the same unusual 
qualities? To answer this question, pairs of markers were given, independently, groups 
of at least 25 Vocabulary Test records, and were asked to assess them in accordance 
with the standard technique for comparing unusual responses. No other assistance 
was given to them. The results were surprising. 

One pair of markers separately scored the responses given by 27 patients suffering 
from senile or arteriosclerotic dementia, and 27 normally healthy old people, matched 
for age and sex. Neither marker could detect more than 9 unusual responses in the 
latter group, 4 of which were given by one person. In the group of demented patients 
they separately detected 83 and 98 unusual responses. 

In scoring the responses given by a mixed group of 56 psychotic patients, two mar- 
kers agreed in detecting and classifying 105 responses showing unusual qualities. 
Separately, they also agreed in detecting, but disagreed in classifying, a further 191 
unusual responses. Subsequently agreement was reached concerning 136 of these, 
with the result that joint agreement was finally reached concerning 241 examples of 
unusual qualities in the responses. It was a discussion of the points on which 
these markers disagreed which led to distinguishing and ultimately defining the 17 
qualitative categories shown in Fig. 3. 
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Using these categories, a third pair of markers separately compared and scored 
the responses given by 25 leucotomized schizophrenic patients, and 18 acute schizo- 
phrenic patients, tested before receiving Insulin treatment. They agreed in detecting 
and classifying 248 responses showing unusual qualities. In addition to this they sub- 
sequently agreed that one marker had noted 31 unusual qualities in these responses 
which the other had overlooked, and conversely, that the second marker had noticed 
90 unusual qualities which the first marker had either overlooked or considered too 
trivial. Both markers finally noted 369 instances of unusual qualities in the patients’ 
responses. 

Markers differed chiefly in the degree to which a response had to be irregular 
before they were prepared to say that it was unusual. Responses showing more than 
one unusual quality also presented difficulty. The majority of markers adopted the 
principle of “ one response one quality.” This may simplify statistics, but psychologic- 
ally it is quite unjustified, and, according to the standard instructions, unnecesary. 


INCIDENCE OF VERBAL DYSFUNCTION IN DIFFERENT CLINICAL CONDITIONS 


Table 1 shows the clinical classes constituting the experimental population, arranged 
in order according to the proportion of patients giving more than one response found 
to have unusual qualities. The groups are small, but they illustrate the method of 
handling the data and the kind of information obtained. A group of 16 acute 
schizophrenic patients tested after, as well as before, Insulin treatment are shown in 
a second entry. 


TABLE 1 
Incidence »f 
Clinical Class No. of Patients Mean Score Verbal Dysfunction 

Hebephrenic Schizophrenia 12 7.0 92 
Organic Psychoses 10 9.9 .90 
Dementia — Senile 15 5.9 .80 
Dementia — Arteriosclerotic 12 4.8 Pen 
Chronic Leucotomized Patients 25 5.0 .72 
Manic Depressives 10 8.1 -60 
Schizophrenia Simplex 13 10.0 46 
Acute Schizophrenics 

before Insulin 18 7.4 44 
Paranoid Schizophrenia 11 9.5 36 
Acute Schizophrenics 

after Insulin 16 9.1 19 


Analysis of the responses of 126 psychotic patients given the short form of the Mill Hill 
Vocabulary Scale. The experimental groups studied are arranged in order according to 
the frequency of verbal dysfunction. 
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From the third column in Table 1 it will be seen that the group of patients 
diagnosed as suffering from schizophrenia simplex obtained the highest mean score 
on the test as a whole. The paranoid schizophrenics and organic psychoses came next. 
Hebephrenics yielded a smaller number of correct responses than either of the other 
two types of schizophrenia studied. Arteriosclerotics gave the smallest number of 
correct responses. 

From the last column in Table 1 it will be seen that the hebephrenics showed the 
greatest degree of verbal dysfunction. The organic psychoses came next. This, how- 
ever, is a combined group, and verbal dysfunction seemed to be more typical of organic 
lesions than of reversible organic conditions. The lowest incidence of verbal dysfunction 
occurred in the group of acute schizophrenics after Insulin treatment. 

Of the 18 acute schizophrenic patients tested before Insulin treatment, 8 showed 
more than one sign of verbal dysfunction. One patient gave 13 responses, all showing 
marked distractions due to psychological intrusions, or dissipated thought. Clinically, 
this interference with thinking became so distressing that the patient committed 
suicide. In contrast to this, a patient whose responses showed only 2 unusual 
qualities broke off treatment. 

For the 16 patients who completed Insulin treatment, the total number of responses 
accepted as showing normal ability to explain and use words increased from 119 
to 146. In 3 cases recovery of ability to explain and use words was the marked 
effect of treatment. In the responses given by the other patients, the presence or 
absence of unusual qualities was found to be more diagnostic. Before treatment 4 
patients showed marked signs of verbal dysfunction in from 6 to 12 of their responses. 
After treatment no patient gave more than 2 responses showing any unusual qualities. 
In particular, treatment removed any tendency towards dissipated, vague, inadequate 
responses. 


QUALITATIVE DIFFERENCES IN PATIENTS’ UNUSUAL RESPONSES 


Although verbal dysfunction was characteristic of certain patients, more than of 
particular clinical groups, some kinds of dysfunction were found to occur more 
frequently in certain clinical conditions. 

Disordered syntax occurred most frequently in organic psychosis and senile 
dementia. For example, a patient suffering from the after-effects of a head injury 
said the word “ View” meant: “What you can see on your own eyes as you look 
and regard everything you can see in the space of your own eyes.” 

Perseveration occurred most frequently in senile dementia, but it also occurred 
frequently in hebephrenia, chronic leucotomized schizophenics, and was found to 
be common to, but less frequent in, all classes of mental illness. It can in fact 
occur under any conditions of fatigue or exhaustion. Patients often elaborated some 
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recurrent idea in successive responses. Thus one patient said for: — 
“ Cruel” —“ To try and believe what you really are. It’s cruel sometimes to 
be kind.” , 

“ Near”—“ To cure kindness you've got to be keen on kindness. Sometimes 

it’s cruel.” 

“ Shrivel”—“ You shrivel up if you don’t believe in what you say. Chivalry— 

it’s sometimes more cruel to act age instead of beauty.” 

Bizarre Content was typical of hebephrenia, but occurred in other groups also. 
A good example of this is a patient who, for the word “ Mingle” said: “ You could 
say ‘ mingle your eyebrows with mine’ ”. 

Rigidity of expression was noted if the same construction was used throughout 
successive responses. It was frequent amongst manic depressive patients, and 
occurred in manic as well as depressed phases. To a less extent it occurred amongst 
chronic leucotomized schizophrenics. 

Poverty of Expression, in which a patient’s response amounted to little more than 
a monosyllable, was found chiefly in depressive psychosis. 

Circumstantial Talk occurred most frequently in seniles, and, to a less extent, in 
arteriosclerotic dementia. These were talkative, superficial, often long, unstructured 
responses, which never got round to the meaning of the word the patient was asked 
to explain or use, and sometimes digressed into childhood anecdotes. 

Structurally vague responses were general to all the clinical classes. They dif- 
ferentiated them from healthy people, but did not differentiate one clinical class 
from another. These were ambiguous explanations of a word which could not be 
said to be grammatically incorrect, although they were comprehensible, if at all, 
only in an attenuated, metaphorical or esoteric sense, as for example when a patient 
said “ Liberty” meant “ Having no difficulty about doing a thing. You can do what 
you like on a thing. Liberty is the thing I find very easy to do. It’s very easy to 
manage.” 

Distractions due to Intrusions of Psychological or Geographical Origin were general 
to all classes of patient. They formed a large and interesting group of responses in 
which either the patient’s thoughts or his surroundings appeared to distract him from 
giving a satisfactory explanation of a word’s meaning, or made it difficult for him to 
use it as other people do. It was as if excessive introversion or extraversion inter- 
fered with a person’s normal explanation and use of a word. A woman, for example, 
said that “ Virile” meant: “ Manly, a person that’s virile can frighten the thoughts, 
but they can’t obey them always.” Another patient said that “ Perpetrate” meant: 
“To mess about, to waste. At times everybody perpetrates because their mind is 
disturbed and they can’t concentrate on what they are doing” ; also that “ Verify” 
meant: “ Thinking of turning round which was in his mind”. As an example of 
an intrusion of geographical origin, one patient said “ Mingle” meant: “A thing you 
could take—that’s easy—the thing on the table”, and another patient said that 
“ Construe” meant: “ You try out holding that form on a sensible rest!” 
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Chain Associations, Telescoped Ideas, Nonsense Words, Dissipated and Echo 
responses occurred in several groups, but with low frequency. 
Stylized Language and Negativistic responses occurred more than once in some 


groups. 


COMMENTARY 


I think we have overcome the major defects of previous studies, and have developed 
a technique which provides us with qualitative categories of verbal dysfunction 
suitable for quantitative comparisons. It would be premature to say more than 
that we are in a position to undertake larger and more detailed studies of verbal 
dysfunction occurring in mental illness. 
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THE PREDICTABILITY OF WORDS IN CONTEXT AND 
THE LENGTH OF PAUSES IN SPEECH 


FRIEDA GOLDMAN-EISLER 
University College, London 


Fluency and hesitation in spontaneous speech have previously been shown to be 
symptomatic of variations in word transition probability. The purpose of the 
present experiment was to corroborate this conclusion by cvidence demonstrating 
the influence of transition probability (amount of information) cn selective behaviour 
stimulated in an experimental situation. ' 

An experiment in sentence completion was designed in such a way as to recreate 
the conditions for word selection in sentences. As a result word transition probability 
was shown to be related not only to incidence but also to the length of hesitation 
pauses within sentences. The completion of gaps substituted for words that had 
originally been preceded by pauses required a significantly longer period of hesitation 
than the completion of gaps substituted for words which had been uttered fluently. 
A relation was thus shown to exist between periods of hesitation before verbalisation 
in different persons performing different operations within the same linguistic setting. 
The conditions under which this relation has been shown to hold have been found to 
be those of successful anticipation of the original speaker’s intentions, 


As a result of a previous experiment (Goldman-Eisler, 1958) hesitation pauses in 
spontaneous speech, “an aspect of behaviour of speakers, presumably related to a sub- 
jective state of the speaking organism”, were shown to be related to an aspect of 
objective language, namely transition probabilities dependent on word frequency in the 
language at large, linguistic structure and context. Sequences of words, uttered 
fluently, were easily predicted from the context (i.e. had low information content) 
by guessers who knew nothing of the speaker’s intentions, the combination of words 
in such sequences appearing to be shared by the language community. On the other 
hand, where guessers found themselves at a loss when predicting the next word as 
originally spoken (i.e. where this word had a high information content) the original 
speaker also seemed to have been at a loss for the next word, for it was at these 
points that he tended to hesitate. 

It was concluded from these results that fluency and hesitation were symptomatic of 
the amount of information contained in the words related; and, functionally, of 
whether the respective utterance was “old, well organised” (Jackson, 1932), i.e. 
practised and automatic, or “new, now organising”, i.e. speech created for the 
occasion and fitted to a specific meaning content. 

It seemed to the writer that such a conclusion should be corroborated by evidence 
demonstrating the influence of objective language (i.e. information measured in terms 
vf transition probabilities) on selective behaviour stimulated in an experimental 
situation, 











EXPERIMENT 


To this purpose a new experiment was designed with a view to recreating the 
conditions for word selection in sentences (a) whose degree of fluency and hesitation 
in spontaneous utterance was known, and (b) the transition probabilities of whose 
words had been previously determined. 


MATERIAL 


The material of this experiment consisted of four sentences which had originally 
been uttered in spontaneous speech and recorded. The duration of hesitation pauses 
interrupting their flow was measured and the transition probabilities of each of the 
words determined in the manner described in a previous paper (Goldman-Eisler, 1958). 

These sentences were subjected to two different treatments. In treatment I words of 
low transition probability were omitted while in treatment II the omitted words had 
a high transition probability. Thus there were for each sentence two different 
incomplete versions: one with gaps replacing words of high information content, 
and one where the gaps replaced words containing little information. The experiment 
consisted of asking the subjects to complete the sentences while reading them. To 
study the influence of amount of information (leve) of transition probability) on the 
ease with which incomplete sentences could be completed, it was necessary to 
equalise the number of actual words given in each sentence, as well as to maintain 
a fair balance in the distribution of gaps between words. This was easier than 
might have been expected, owing to what seems to be an inherent quality in the 
structure of language. About 40% of all words had transition probabilities of less 
than 0.10 and 50% of less than 0.20. Most of these were distributed between words 
of high expectancy in such a way that the succession of words of low and high 
transition probability showed a balanced alternation. 


PROCEDURE 


The sentences were presented to the subjects whe were instructed to read them, 
substituting for the gaps in the text the words most suitable in the context as they 
went along. They were instructed to do this at a conversational pace, being 
permitted to pause, as speakers do in spontaneous conversation, but asked to keep 
in mind an imaginary listener waiting for the sentence to come to an end in reasonable 
time, and to maintain the illusion of continuity found in connected spontaneous 
speech. 

The experimental design shown in Table 1 was adopted in order to eliminate from 
the experiment the effects of learning and the influence of the order of sentences 
having gaps concealing words of differing transition probabilities. 
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TABLE 1 


Order of reading sentences by 
subjects A, B, C, D, E, F, G, H, 


Sentences with gaps replacing words 1 2 3 
of high transition probability 
F, A B E F 
F, G H Cc D 
M, E F A B 
M, Cc D G H 


Sentences with gaps replacing words 
of low transition probability 


F, Gc Dp G H 

F, A B E F 

M, GH Cc ®s 

M, E F A 
RESULTS 


t 


1. The readings and simultaneous completion of the sentences were recorded on 
tape and the durations of periods of speech and silence transmitted through a pen 
recorder to teledeltos paper from which the durations of the interrupting pauses were 
then taken. 

The rate of reading, in syllables per second, was calculated from the duration 
of each reading and it was feund that the logarithms of these rates were linearly 
(and inversely) related to the logarithms of the sums of the duration of the pauses. 
An analysis of variance was computed using, as variate, the log (rate) since its 
distribution was sensibly normal and its variance homogeneous. 

Where the gaps stood for words of high transition probability, the mean length 
of hesitation, before substituting words for gaps, was 20 seconds or the mean 
production rate of such speech 1.75 syllables per second ; and 37 seconds or 0.87 
syllables per second when the gaps stood for words of low probability. The difference 
was significant at the level of P = 0.01 (F = 8.8) but some of the variance 
(significant at the 0.05 level) was between sentences. The individual differences 
between readers played little role in determining the length of hesitations. Thus 
transition probability or amount of information contained in the words of a sentence 
were shown to be related, not only to incidence, but also to length of hesitation 
pauses when these occurred before filling gaps in sentences whose transition probabilities 
were known. 

2. The times taken by the subjects to fill the gaps in the sentences varied widely 
(between 1 and 92 seconds, with one gap of 165 seconds in the tail of the distribution) 
and were obviously of a different order of duration from the pauses made in the 
original speech. While generally longer pauses were to be expected from subjects 
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involved in recreating sentences formulated in brains other than their own, it was 
still conceivable that the difficulties of selecting the next word facing the original 
speaker and causing him to pause might be reflected in the lengths of the hesitation 
of subjects trying to recreate a sentence by filling in the gaps. 

As mentioned before, about one half of the words in each sentence had been 
omitted and replaced by gaps of either low or high transition probability ; and there 
were thus many more gaps to be filled than there had been pauses in the original 
speech. To answer the question whether the length of pauses made by subjects in 
this experiment, reading and filling the gaps as they went along, had any relation to 
the length of pauses in the original speech, the former were divided into two classes: 
(a) those that preceded the words that had originally been uttered fluently ; and 
(b) those preceding words that had originally been preceded by pauses. The relation 
proved significant (y> = 6.0 and 6.1, P = 0.02) for two of the sentences. Here 
the completion of gaps concealing words that had originally been preceded by pauses 
required a significantly longer period of hesitation than the completion of gaps 
concealing words uttered fluently. For the other two sentences no such relationship 
existed. 

When explanation of this fact was sought, a striking difference between the two 
groups of sentences became immediately evident. In the first pair, a high proportion 
of correct solutions (i.e. where the words selected to fill the gaps were identical with 
those used in the original speech) had been made, 89% and 79.2% respectively, 
where words of high transition probability had been omitted ; and 46% and 36.2%, 
where words of low transition probability had been removed ; while in the second pair 
the proportion was considerably lower, 59.5% and 52.5% for words with high 
transition probability and 13% and 13.4% for words with low transition probability. 


DISCUSSION AND CONCLUSIONS 


In order to interpret these results, we need to understand the implications of 
the correct completion of the gaps in the experimental sentences. A high proportion 
of correct words in completing them seems to indicate that the reader’s (or subject’s) 
selection of words for the purpose was determined by a sentence schema very similar 
to that which had determined the word choice of the original speaker. One might 
say that the reader’s verbal thinking was guided along the same lines of meaning 
as that of the original speaker. On the other hand a low proportion of correct 
words chosen in completing the experimental sentences seems to indicate that the 
reader who was thus unsuccessful found it difficult to anticipate, from the skeleton 
sentences presented to him, in what direction the original speaker’s thinking had 
been moving. 

It is thus made evident that subjects filling in the gaps in incomplete sentences 
while reading them have a significantly greater chance of approximating the original 
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version whenever the sentence schema presented allows for a reconstruction of the 
meaning according to the original speaker’s intention. Further, the lengths of the 
hesitations of subjects whose guessing showed the anticipation of these intentions— 
in other words, who proved themselves to be thinking along the same lines as the 
original speaker—were related to the lengths of the pauses made in the original 
speech. In those sentences, on the other hand, in which omission of words reduced 
information content to the extent of obstructing meaningful reconstruction, i.e. where 
subjects were left without constraints to direct their task of sentence completion, 
there was no relation between the length of pauses made while performing it and 
the pauses in the original speech. It appears that those who think alike, in the 
matter of pauses, behave alike. 


A relation is thus shown to exist between periods of hesitation before verbalisation 
in different persons performing different operations within the same linguistic setting. 
The conditions under which this relation has been shown to hold have been specified 
as those of successful anticipation of the original speaker’s intentions. 


While in the previously reported experiment (Goldman-Eisler, 1958) a relation had 
been established between pausing (an aspect of the behaviour of speakers) and transition 
probability (an aspect of linguistic structure) the present result goes beyond this. 
It shows that different individuals operating on the same language material will, 
provided they are subjected to the constraints of identical meaning structures, respond 
in a similar way to the tasks of creating speech and of reconstructing this speech 
from given fragments of it. The significance of the similarity of this response may be 
better understood if we consider the implications of these two linguistic operations. 
The analysis of the results of the previous experiment led to the conclusion that the 
process of verbalisation consists of two parts (Goldman-Eisler, 1958) ; that every act of 
speech is based on an anticipatory plan possessing a specific structure. Hesitation pauses 
were conceived of as indicating the lag between the two halves of the verbalising 
process, the “ subjective” and the “ objective” (in the terms of Hughlings Jackson) 
or mentation and action. It was concluded that “ in old and acquired forms of speech ”, 
where the process is automatic and the first half of it subconscious, speech is highly 
integrated and utterances instantaneous. Where the words are being fitted to the 
proposition anew, the lag between anticipatory and actual, or Jackson’s “ subjective ” 
and “ objective ”, verbalisations may be expected to be a function of the degree of 
indecision in selecting the words to be fitted. 


If we apply this hypothesis to the present results, we should equate the task of 
completing the sentences to the second stage of the verbalising process. Moreover, 
the fact that the durations of pauses made in the two speech operations were related 
when the original speaker’s intentions were successfully anticipated in the task of 
sentence completion—in the sense that, in performing the latter, subjects hesitated 
significantly longer when filling the gaps concealing exactly those words which in 
the original speech had been preceded by pauses—seems to indicate that both types 
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of pause, those interrupting the utterance of sentences in spontaneous speech and 
those preceding the completion of gaps in given sentence schemas are, in fact, identical 
in function, i.e. are related to the selection of words to be fitted into an existing 
structure. 
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BOOK NOTICE 


R. Klein and W. Mayer-Gross, The Clinical Examination of Patients with Organic 
Cerebral Disease. Cassell, London, 1957. pp. xi + 96. 15s. 

This book is of interest not only to workers concerned with the investigation of 
patients suffering from disorders resulting from lesions of the brain. It should be 
read by all those with an interest in the diversity of speech-thought relationships. In 
presenting in a compact form the methods of investigation required in the examination 
of patients exhibiting disturbances of the more highly organised forms of behaviour 
as the result of organic cerebral disease, the authors have set themselves an aim which 
will be appreciated by those who are familiar with the way theories have, in this 
field, been reflected in the observations and case descriptions reported. The authors 
claim to include in their scheme ali the useful methods from the various schools of 
thought, and describe “a standard procedure by which all essential data can be 
collected, as unbiased by theory as possible ”. 

The interpretations and descriptions of the tests of organic cerebral diseases are 
grouped under nine main headings: Aphasia, Disorders in the Visual Sphere, 
Temporal Disorder, Disorders of Motor Actions at a Higher Level, Tactile Agnosia, 
Disturbance Connected with the Body Scheme, Disturbance in the Use of Numbers 
(Acalculia), Rules of Dominance, and General Disturbances. The lucidity in the 
descriptions of clinical detail and of the distinctive signs marking the sub-groups from 
each other is impressive. At the same time the authors are caretul to warn against 
singling out any one particular sign and they emphasize that in a complex syndrome 
such as aphasia, for example, every symptom has to be viewed according to the 
context in which it occurs. Thus an understanding of the processes preceding speech 
formulation is required if the mode of operation of a defective function is to be 
properly assessed. When one notes such well-founded observations as, for instance, 
that a precise scheme of thought which requires exact formulation will, in the 
presence of inadequate language resources, inhibit speech, while a vague scheme of 
thought will mask the aphasic defect, one hopes for a more extensive exchange of 
ideas in studies of aphasic phenomena and of the relation of normai thought and 
language. 





232 


PUBLICATIONS RECEIVED 


Abstracts of English Studies, 1 (1958), 5-8. 

Leuvense Bijdragen, 46 (1956-57), 3/4; 47 (1958), 1/2. 
Mechanical Translation, 4 (1957), 1/2, 3. 

Methodos, 9 (1957), 33/34, 35/36. 

Norsk Tidsskrift for Sprogvidenskap, 17 (1954) ; Suppl. 4 (1956). 
Philologica Pragensia, 1 (1958), 1-3. 

Psychometrika, 23 (1958), 1-2. 

Revue de Linguistique, 1 (1956) ; 2 (1957). 

Slovo a Slovesnost, 19 (1958), 1-3. 

Studia Romanica, 1 (1956), 1-2 ; 2(1957), 3-4. 


Bolinger, Dwight L. (1957). Interrogative Structures of American English. 
Publication of the American Dialect Society, 28 (University of Alabama Press). 

Klein, R. and Mayer-Gross, W. (1957). The Clinical Examination of Patients with 
Organic Cerebral Disease (Cassell, London). 


INTERNATIONAL COLLOQUIUM ON COMMUNICATION 
AND LANGUAGE 


Paris, 22-25 January, 1959. 

This colloquium is being organized by the Faculty of Medicine of the University 
of Paris, with the collaboration of the Professor of Otorhinolaryngology, the Société 
Francaise de Phoniatrie, the Laboratory of Physiological Acoustics and the Professor 
of Neuro-Physiology at the Collége de France. It will take place at the Faculty of 
Medicine, 45 rue des Saints-Péres, Paris 6e, from the 22nd-25th January, 1959. 

The Colloquium will discuss communication and language from the point of view 
of information theory and the programme will deal with the following aspects of 
the subject: Mathematical and Philosophical, Physical, Physiological, Pathological, 
Psychological, Statistical. 

The President will be Professor Aubry. Papers will be read by Mme. Borel-Maisonny, 
Mr. Busnel, Prof. Fessard, Mr. Fournier, Prof. Fry (England), Mr. Lehman, 
Prof. Meyer-Eppler (Germany), Mr. Moles, Dr. Perdoncini, Prof. Rosenblith (U.S.A.), 
Dr. Vallancien. 

Further information may be obtained from: 

Dr. B. Vallancien, 
16 rue Spontini, 
Paris 16e. 





Clare o' Molesey Ltd. (T.U.), Molesey, Surrey. 








SS ~~ 


ee 








Sa ee 





233 


PECULIARITIES OF THOUGHT IN PATIENTS WITH 
SENSORY APHASIA 


E. S. BEYN 
Institute of Neurology, U.S.S.R. Academy of Medical Sciences 


This paper makes an attempt to characterize the intellectual peculiarities of patients 
with sensory aphasia. Long and detailed observation of more than 50 patients has 
shown that the most important factor in these cases is the inability to differentiate 
between speech sounds. This in turn affects other language levels, semantic, syntactic 
and grammatical, and also the personality and intellectual activity of the patients. The 
effects of aphasia on these features and on the comprehension of speech and the active 
speech behaviour of the patients are described. 


The problem of thought in aphasias is a highly interesting and acute problem owing 
to the very close interaction between speech and thought which is the subject-matter 
of many theoretical experimental-psychologica! investigations. 

In spite of a relatively considerable number of researches devoted to this problem, 
there still exist conflicting viewpoints on the fate of thought in various forms of 
aphasic disorders. These conflicting viewpoints are, undoubtedly, connected with the 
different theoretical approaches of the authors to this problem and, primarily, to 
the question of the relationship between thought and speech in normal people, i.e., 
with the fact whether the authors regard these processes as completely identical, as 
processes independent of each other, or as a unity in which both processes, in spite of 
their close interdependence, preserve their peculiar inherent features.’ 

Let us recall here some points of view on the question of the fate of thought in 
aphasias, including the sensory form which is of particular interest to us. 


The viewpoint of Goldstein (1948), for example, is well known ; in conformity 
with the basic principles of gestalt psychology, he considered a “ decline of categoriality 
of behaviour,” a change of the “abstract set,” as a general symptom of aphasia 
caused by any local lesions of the brain. 

Similarly, Head (1926) defined all aphasic disorders “as disorders of symbolic 
formulation and expression,” although he did not take such a categorical stand as 


1 The theory of Pierre Marie is a graphic example of a viewpoint which proceeds from the 
conception of absolute independence of speech from thought, of their complete disunity. Under 
the influence of the Wiirzburg psychological school, which divorced “ pure” thought from 
visual-sensory experience, Marie reduced the executive side of speech to strictly technical 
functions. Owing to this, he admitted only one form of aphasia (the sensory form), which 
he regarded as an “intellectual” one, as an intellectual defect. As to motor aphasias, he 
regarded them not as independent syndromes, but as anarthritic disorders of speech. 
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Marie and Goldstein, especially with regard to sensory aphasia which he designated 
as a syntactic form. He believed that this form is characterized by a relative intactness 
of the intellectual side and internal speech of the patient. 

Isserlin (1929-1932) disagreed with Goldstein. Of a certain interest is his statement 
that the repetition of the experiments of Goldstein and Gelb with the classification 
of colours and objects did not confirm the invariable decline of categoriality of 
behaviour in patients with aphasia. In the opinion of Isserlin, some patients with 
brain traumas successfully coped with the task of finding the necessary words in 
spite of the difficulties which they encountered. 

Monacov (1929) emphasized that patients with sensory aphasia retain intellectual 
orientation. The connection between intact intellectual operations and verbal symbols 
which cannot be called forth becomes violated. Owing to this, the verbalization of 
thought proves to be impossible. 

Along with such conflicting viewpoints resulting from different theoretical concepts 
of relationship between speech and thought, it should be pointed out that in many 
cases the authors did not confirm their viewpoint by experiments. 

Conflicting views are also met with m up-to-date literature (from those which 
prove the indissolubility of speech and thought up to those which stress the absence 
of any close connection between them) ; however, most often one meets statements 
concerning the erroneousness of the viewpoint of Goldstein and others according to 
which brain lesions invariably result in derangements of the “abstract set.” Such 
considerations have been expressed, for example, by Weisenburg and McBride (1935), 
McFie and Piercy (1952), Konrad (1957), Critchley (1953), Luria (1947), Kok (1957), 
Kennedy (1936), Bauer and Beck (1955). The last-mentioned authors decidedly 
object to Goldstein’s hypothesis concerning the inevitability and homogeneity of 
derangements of abstraction in patients with left-side lesions and aphasias. Besides, 
like some other authors, they prove that particularly pronounced derangements of 
abstraction are observed in conditions of right-side lesions of the brain (in right- 
handed patients with brain lesions not accompanied by aphasia). 

The publications of Schuell (1955) contain statements concerning the erroneousness 
of the viewpoint that patients “ with disturbances of auditory retention and memory ” 
(the sensory form) are regarded as mentally inferior. It is only the defects of speech 
which the author stresses in aphasias. 

Thus, the insufficiency of the concepts of a global lowering of the intellect and 
decline of personality to the level of concrete behaviour, as an indispensable feature 
of all aphasic disorders, becomes more and more evident. We witness a steady 
departure from the previously unreserved point of view concerning the complete 
identity of disorders of thought and speech in aphasias. Probably, as a reaction 
against the views of Goldstein, which not long ago were quite widespread, some 
authors regard intellectual defects exclusively as “ associated ” with defects of speech 
(Kennedy and others). Unfortunately, the views of some authors are not always 
corroborated by an adequate analysis of the specific features of intellectual activity 
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in various forms of pathology of speech. Meanwhile, only a concrete analysis of 
the peculiarities of intellectual processes which are essentially connected with some or 
other syndromes of aphasic disorders can help us properly to approach this complex 
problem connected with the theory of aphasias. Alternative points of view are, 
apparently, least productive in the solution of the question of relationship between 
the disturbances of the intellect and disturbances of speech caused by aphasias. 

In the present article we shall make an attempt to characterize the intellectual 
peculiarities of patients with sensory aphasia as a result of traumatic and vascular 
lesions of the brain.” We base ourselves on the concept of unity of speech and 
thought, unity of the significative and phonetic (phasic) aspects of speech ; this, 
however, does not exclude their individual specific law-governed properties, as well 
as certain distinctions in their ontogenesis. We proceed also from the proposition that 
speech is not merely an external expression of thought (or its “ garments”), that 
“thought is not only expressed by speech, but also accomplished by it.” This basic 
proposition developed in Soviet psychology by Vygotsky, Rubinstein, and others, 
facilitates the analysis of pathological dissociations. 

The data obtained in Pavlov’s school are of particular importance for the com- 
prehension of the mechanism governing the activity of the auditory zones of the 
cortex, as well as of the psycho-physiological essence of the processes which underlic 
sensory aphasia. 

Experiments performed by this school demonstrated that the activity of the 
cerebral cortex, in particular of the auditory temporal systems, is not confined to 
a mere reception of acoustic stimulations. The cortex of the temporal region is an 
apparatus which subjects the acoustic stimuli to analysis and synthesis. Owing to 
this, a lesion of the cortex of the temporal lobe may leave the audition itself intact, 
and at the same time greatly disrupt the analysis and synthesis of sounds. 


In the works of Soviet authors, Boskis and Levina (1936), Luria (1947-1948), 
Beyn (1947) and others, sensory aphasia resulting from the lesion of the left-side 
temporal cortical systems is regarded as a disturbance of the higher forms of speech 
acoustic gnosis, as a derangement of reception, of analysis and synthesis peculiar to 
certain linguistic systems of phonemic signs. 

This point of view is based on the data of linguistics concerning the phoneme 
as a unit of the phonetic system of the language (Baudoin de Courtenay, 1822, 
Shcherba, 1912, Troubetskoi, 1939). 


According to linguistics, phonemes are elementary systems of phonetic signs which 
perform the function of significative differentiation in the language and in which the 
meaning is inseparable from the sound. 

The disintegration of the historically evolved system of the language, as a result 
of which the sounds of speech (phonemes) cease to perform their generalized function 


® Protracted observations were carried out on more than 55 patients with sensory aphasia in 
the course of corrective training and experimental-psychological investigations. 
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of significative differentiation leads to a derangement of differentiation between sounds 
of speech, to the whole syndrome of sensory aphasia. 


The principal characteristic feature of a patient with aphasia—the lability of his 
speech—results from his inability to differentiate between speech sounds. Different 
words may sound the same to him, and on the contrary, similar words may sound 
different. 


One of our patients described this state in the following way: “If you pronounce 
the word ‘table’ three times in succession, every time I shall hear it differently.” 


The derangement of sound differentiation brings about a deterioration of significative 
differentiation as well. This, in its turn, determines the peculiar character of the 
patient’s system of speech (the peculiar agrammatism exhibited by patients with 
sensory aphasia). This is why the question of the fate of the significative side of 
speech and the more extensive question of the fate of thought in conditions of 
disintegration of the phonetic (phonemic) system of the language is of particular interest 
to us. There is no doubt that the sensory form of speech disorders provides particularly 
wide opportunities for analysing the role of vocal speech in the process of thought. 


What do our data reveal ? What happens to the meanings of words—this generalized 
reflection of reality ? What are the peculiarities of the grammatical system of speech ? 
What are the analysis and synthesis, the process of generalization, etc. in conditions 
of disintegration of the phonetic (phonemic) foundation of speech ? Finally, what are 
the peculiar features of personality in patients with sensory aphasia ? 

First of all, it must be pointed out that the investigation of a large number of 
patients with sensory aphasias of traumatic (battle-traumas) and vascular etiology, 
as well as dynamic observations in the course of corrective training, have revealed the 
general picture of the disorders. In spite of the peculiarity of each patient and a 
certain difference in the depth, gravity and stability of the disorders, in spite of 
the specific reaction of the personality of each patient to the defect, it has proved 
possible to accomplish a comparative analysis of various cases and to single out 
the most essential features which are common to all patients or to most of them. 
We have observed not an irregular or accidental agglomeration of symptoms, but 
their law-governed and interdependent correlations. In the present article we shall 
dwell in brief on the common features of personality and intellectual activity of 
patients with the syndrome of sensory aphasia. 


EFFECT OF SENSORY APHASIA ON PERSONALITY AND AFFECTIVE STATES 


In most cases patients with sensory aphasia revealed a considerable integrity of 
personality (we, naturally, do not touch here upon cases of particularly grave and 
extensive lesions of the cerebral cortex). We observed in the patients an intact 
attitude towards the environment, adequate social aspirations, differentiability and 
relative intensity of emotional life. The activity of the patients was distinguished 
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by its purposefulness, by a definite system of motives and ability to hold to intentions. 
These features manifested themselves in the course of clinical observations as well 
as special experiments. An experimental study of the “level of requirements,” a 
“return to interrupted actions,” etc. (experiments suggested by the school of Levin) 
have revealed an adequate stability of the process of formation of intentions, a 
sufficient, and even heightened affective strain in the patients’ activity, and the 
possibility of switching (accompanied, however, by a certain inclination towards 
fixation). The patients exhibited an adequate attitude towards the social appraisal of 
their activity and did not admit any undeserved overestimation of this activity. The 
level of requirements was not reduced in cases of failure, and the patient did not 
look for any excuses outside himself. His reaction to “ success” or “ failure” in any 
activity was always very strong and often affected the main link of his emotional 
experience—the feelings evoked by his realization of the speech defect.’ 


Certain peculiarities of the emotional make-up of the patients—their affective over- 
strain, inclination for affective discharges, etc.—are closely bound up with the 
specific character of the sensori-aphasic disorder. A constant “ searching situation ”, 
i.e., constant fluctuations between such extreme poles as “ I’ve found ” and “ I’ve lost ” 
(a word, often a phrase, and sometimes even a thought) results from the basic disorder 


8 Here is the record of one of the experiments connected with the study of the so-called 
“ level of requirements.” Patient K. was given the task of throwing 10 rings on a rod placed 
at a distance of four and a half metres. The instruction stated that the task was difficult ; 
however, it strictly prohibited overstepping the limits of the circle which was outlined on the 
floor with chalk. 


The patient began his attempts to throw the rings on the rod; at first he exhibited a 
playful attitude towards this task, but did his best to understand it, regarding it as a test of 
his “ dexterity”. The first failures (6 attempts) resulted in a number of adaptive searches. 
Without violating the instructions, the patient began to bend his body forward, to turn now 
one, now the other side of the body, to throw the rings now with one hand, now with both 
hands, etc. The steadily increasing number of unsuccessful tests did not lead to any weakening 
of the patient’s attempts. At the same time his affective strain became more and more 
intense; he stopped joking and ascribed each failure to “himself”, exclaiming: “How 
clumsy I am! Oh, what a clumsy man I am!” He never tried to justify himself or to 
blame the experimenter for the unduly difficult task. After one hour and a quarter of strenuous 
attempts to solve the task (without any violation of the instruction) the patient abandoned this 
work altogether stating: “ The task does not seem excessively difficult. If you allow me, I'll 
try again tomorrow, without you . . . in order not to take your time...” The next day 
the patient worked again for 45 minutes on his own initiative, i.e., without having been 
reminded by the experimenter. Subsequently he himself informed the experimenter of his 
new failure. 


When the patient was told that the task was insoluble, he first reacted to this statement with 
distrust ; then he said smiling : “ No, there was something wrong in my actions . .. Possibly 
someone has invented some trick . ... I must do my development .. . physical culture 
development . . . but speech is of prime importance ...” When the patient was told that 
two of his attempts had been successful, he replied : “No, this was a matter of chance...” 
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and serves as a means of compensation. The “search” does not overstep the limits 
of internal speech, but sometimes the patient performs this search aloud (for example: 
the patient is looking for the word “ tiger” which in Russian is pronounced /‘tigr/. 
Here is the way he takes: “/‘sora/, /‘kira/, /‘kura/, /rin/, /‘rigo/, /ga:t/, /‘\a:gir/, 
/‘ttkr/, /‘tigr/... Here it is ! ” 

All this inevitably ensues from the lability of the patient’s perception of speech 
and determines the specific character of his mental make-up. 


INTELLECTUAL OPERATIONS IN SENSORY APHASIA 


Let us now pass to the peculiarities of the intellectual and intellectual-verbal 
operations of patients with sensory aphasia. 

The study of a number of intellectual operations in sensory aphasia confirms the 
opinion of some authors concerning the absence of any primary intellectual defects 
in this form of aphasia. We observed an adequate level of visual-imaginative 
intellectual operations. The patients easily analysed and correlated the elements 
of objects which were presented to them visually ; they grasped and retained the 
principle of their structure and successfully synthesized figures from these elements. 

Thus, the ascertainment of correlations between elements of objects offers no 
difficulty to our patients. They are also able to classify various objects or pictures. 
In most cases they classify them according to conceptual categories (furniture, birds, 
animals, etc.) and not according to concrete situations. The patients often reveal 
the ability for abstraction, analysis and synthesis, comparison and differentiation of 
visual and verbal material. 

Practically, however, all this strictly depends on the degree of lability of the 
sounds of speech. A retained intention of thought, a grasped principle of abstract 
operation may remain unrealized owing to phonetic verbal disintegration. In view 
of this, a task consisting of many stages or links and particularly requiring the 
participation of speech acts proves to be difficult for patients with sensory aphasia. 
This is why arithmetical problems (including simple arithmetical counting) often 
remain unsolved (especially when they are to be solved orally) in spite of the fact 
that the patients fully realize the proper method of solution. Silent counting is more 
accessible to the patients, especially in written form, when each link is fixed optically.' 


* Characteristic in this respect is the frequently observed tendency to substitute written 
arithmetical operations for oral ones. The patient, as it were, accomplishes orally the whole 
process of written counting. 

Example: Patient B, Oral task to subtract 53 from 126. “ Fust a minute... Just a minute... 
I wish I could put it down... Just a minute... Let me see... three ... then... borrowing ... 
twelve ... seven... seven and three... here it is ... seventy... no... five... five and three... fifty 
three Am I right?” The patient moves his hand in the air, as if drawing figures, screws up 
his eyes, etc. It seems that he imagines optically the whole process of the written arithmetical 
operation. In these conditions the lability of the sounds of speech is of a less pronounced 
character. 
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When in spite of th~ phonetic and verbal substitutions and searches, the intellectual 
task is solved correctly, the way in which this solution is verbally expressed may be 
long, uneconomical and highly peculiar as to its syntactic structure. 

We shall cite a few examples. The task offered to the patient belonged to the 
group of classification tests and was called “The fourth is superfluous”. A card 
with the pictures of a dress-maker, a wood-cutter, a reading girl and a man resting 
on a sofa were presented to the patient. His task was to single out one “ superfluous ” 
picture. 

(A). Patient T. “What is it?... All these... work... working ...(further the 
patient uses /‘ustvoni/ instead of /‘umstvoni/ which in Russian stands for 
“* intellectual”) ...and the other, and this... (showing the man at rest) not work 
but vice-versa”. The Russian word /ra‘bote/, which means “work”, is mis- 
pronounced /ro‘pot2/. 

(B). Thimble, bobbin, needle, pipe. “Here is work in the thimble... sewing... 
bobbin... and a pipe... pipe... what is it: p-i-p-e ?, what can it mean ?... pipe 
of peace...men smoke them... it is also needed...if there were more of them 
there would be no war... not needed here though... (Pointing to the pipe).” 

This example shows that the loss of word-meaning and the subsequent search for 
it lead to the emergence of an accessory meaning of this word. Still, the task as a 
whole was solved in a correct way. 

We also observed opposite cases when the tasks could not be solved at all. 

(C). Foot, boot, shoe, slipper. 

“Foot... boots don’t suit... foot to the left and boot to the right... with the 
left foot ... what is ‘ with the foot ’.” Here the patient is led away from one meaning 
of /no’go1/, “with the foot” to another meaning, an adjective, which in Russian 
stands for “ naked”. And he goes on: “ What is /no’go1/ ? Is it “ naked”... I don’t 
know.” 

We see that the intellectual ope:ation proved unsuccessful not because the meaning 
of the word “foot” as a part of the body, or of the word “ boot” as an article of 
footwear disintegrated, but because the lability of sounds transformed the word nogoi 
(“ with the foot”) into the word nagoi (“ naked ”). 

Here are some examples of juxtaposition of verbal concepis by analogy (the patient 
was given the task of finding a fourth word which would be related to the third 
given word in the same way as the second word is related to the first). 

An alley, a garden, a street—‘“ A house... probably in a village... an alley...an 
alley... what is an alley ?... Yes, Turgenev’s alley of lime-trees... Alleys... he 
was fond of lyrics...alleys...moonshine...and a garden.... garden... What is 
it then ?...” 

Here, too, the task remained unsolved. First the word “house” emerged, then 
the word “ village.” flashed through the patient’s mind (possibly as a verbal paraphasia 
to the word “town” sought for by the patient) ; then all disappeared, and only the 
word “alley” continued to persist (owing to some absolutely different connections), 
carrying the patient away into the sphere of his literary experience. 


” 
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Below is an example of the emergence of opposite notions. 

Success (/us‘peh/ in Russian). Success—“ missed it, was late, success, success... 
was late, success, success... That’s right (a sigh)...no success today... Oh... 
success (here the patient is looking for the word “ failure”, /pro‘va:1/ in Russian, 
and shifts his sound—/us’peh/, /‘usbeh/, /us’beh/...). This is like “ falling 
through the ice” (the Russian /pro’va:1/ stands for “failure” and for “ falling 
through ”). What is the word ? I don’t know.” 

In all these examples we witness a stability of the patient’s mental intention ; 
however, owing to the phonetic verbal disintegration, the task often proves insoluble. 

In other words, the rupture of the patient’s thought is caused not by the violation 
of the motives or logic of his thought, but exclusively by the instability of his means 
of verbal thought. : 

It must be pointed out that the comprehension of the figurative meaning of proverbs 
and metaphors is in most cases also accessible to our patients. They grasp both 
the abstract and emotional-significative essence of proverbs. 

The study of incomprehension of speech in sensory aphasia (which is the basic 
symptom of the whole syndrome) showed that it is least of all connected with 
incomprehension of the logico-grammatical modes of the language. Most patients 
successfully cope with the comprehension of the inflexional relationships, various voice 
forms, case inflexions, etc. Moreover, as shown by the study of the auditory perception 
of an isolated word, the patients grasp the meaning of its most generalized elements, 
or its general conceptual nature, easier and quicker than its concrete meaning, its 
connection with the given object. The grammatical form of a word, like the logico- 
grammatical modes of the language, in most cases resists phonetic disintegration 
through the lability of the sounds of speech (see below). Consequently, the incompre- 
hension of a patient with sensory aphasia is in no way “ intellectual ” incomprehension. 
The significative essence of speech and the significative logico-grammatical modes of 
the language (owing to which it serves as an instrument of conveyance of thought) 
are comprehended by the patients first of all. This distinguishes sensory aphasia from 
semantic aphasia (in the terminology of Head) with its disintegration of the logico- 
grammatical constructions, as well as from motor (frontal) aphasia with its derange- 
ment of mental intention, motivation, etc.”* 

The incomprehension of a sensory aphasic, his “ verbal deafness ” (in the terminology 
of the classics) is of a specific character ; it ensues from the lability of the sounds 
of speech, from the derangement of the acoustic gnosis. The comprehension of 2 
sensory aphasic recovers with the rehabilitation of the acoustic gnosis, with the 


5 See A. R. Luria (1947). 


® Here we have something diametrically opposed to the idea expressed by the great Russia 
writer Leo Tolstoy (in 1909) ; having in mind normal people, he stated: “It is not the wora 
itself which is incomprehensible almost in all cases, but the notion expressed by the word.” 
Reversing the sense of these words, we can say: “It is not the notion which in most cases 
is incomprehensible to a patient with sensory aphasia, but the werd itself.” 
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recovery of constancy in speech perception, and the recovery of the ability to 
differentiate and to generalize in speech perception. 

The afore-mentioned data allow us more profoundly to characterize the intellectual 
processes in sensory aphasia. Everything which is usually designated as categoriality 
proves to be intact in the first place. In contradistinction to what is often observed 
in other pathological forms, in sensory aphasia the system of generalizations does 
not disintegrate. Discursive intellectual operations (i.e., consisting of many links, 
bearing a successive character and changing in time) prove to be impeded and 
often inaccessible owing to the specific character of the defect of speech. It is 
precisely here that the defect of the phonetic side of speech manifests itself with 
particular force. 

Thus, sensory aphasia helps to corroborate the highly important proposition that 
acts of abstraction, which in their genesis are speech acts, subsequently prove to be 
relatively independent and do not require any constant reinforcement by vocal speech. 


THE SIGNIFICATIVE STRUCTURE OF WORDS IN SENSORY APHASIA 


To substantiate these conclusions we shall turn to the significative structure of 
words in sensory aphasia. First, we shall dwell on the results obtained by us from 
the study of the auditory perception of separate words isolated from the context. 

The patients had to ascertain the meanings of definite words. 

Here are some examples: 

1. Tractor— What is it?..I can’t hear...I don’t get it...Say it again...” 

Sea (Russian /‘morjo/) “ /‘morja/, /‘molja/, /‘poljo/. What nonsense ? ! ” 

The pronunciation of a word is often perceived in a distorted way and remains 
“just merely a sound.” Tests performed on patients of the given type show that in 
most cases this may happen with any word, irrespective of the degree of its abstract- 
ness, its grammatical category, etc. 

2. Frequently the disintegration of the phonetic structure of the word makes the 
meaning of the whole word incomprehensible to the patient ; grasping, however, 
a certain element from the pronunciation of the word, the patient identifies this 
element (a syllable or a sound) with the meaning of the whole word. Sometimes 
this takes place in the course of his “ searching for sounds ” (this term conventionally 
designates the reiterative utterance of the given word to which the patient often 
resorts in order to understand the meaning of the word). Factory (in Russian 
/‘fa: brtko/)—* /‘va: briko/ ... /‘priko/ ... /‘va:bri/... /bra:/... /bra:k/... 
/bra:/... Is that so” (there is no sense in it at all). . 

Crane—“ It’s something in cans. Something to eat...” (Crab). 

We see that a distorted perception of the sound of a word leads to a wrong perception 
of its meaning. In the process of his “ searches” the patient snatches at any meaning 
which flashes through his mind. Undifferentiated perception of pronunciation results 
in undifferentiated perception of the meaning. 
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3. One and the same subject (sometimes at the same stage of recovery, and 
sometimes at different stages) may exhibit a more adequate auditory perception of 
the meanings of words. In such cases the polysemy of the word may disappear, and 
the subject perceives the word in a limited, one-sided sense which is connected with 
the given context only. 

Sleeve—“ Something long... It is big... It is wound up... Something is let out 
of it in the country...something big... Made of rubber...(Are dresses with 
sleeves ?) I don’t know...I haven’t seen.” 

The Russian /ru’ka:v/ stands for “sleeve” (of a dress) as well as for “ rubber 
hose”. 

Sharp—“ Sharp question means complicated question, unpleasant question and 
a tongue too... everybody is afraid of it... And what other sharp things are there ? 
I don’t know.” 

It is noteworthy that as a result of such limited perception the direct concrete, 
material meaning of the word disappears, while its figurative or abstract meaning 
remains (the latter, however, owing to its one-sided character, sometimes plays a 
concrete role). 

Thus, the examples cited above show that when the phonetic basis of speech 
is deranged, the many-sidedness, the hierarchical character of the semantic content 
of the word may become reduced. 

Hygiene—* This is something from the sphere of medicine. Give me a sentence, 
and I shall probably find it...” 

Football—“ Something connected with physical culture... But what ? ” 

In comparison with the previous group, these examples reveal a considerably more 
distinct and complete ascertainment of the conceptual meaning of the given word 
or of its connection with a definite sphere of activity. Still, even here the word 
remains only an undifferentiated “ hint” of the real wealth of generalizations, which 
lie behind the word. It must be emphasized that this “conceptual trace” is dis- 
tinguished for its stability, being retained even when in the course of the ascertainment 
the subject loses the pronunciation of the word. 

Often the patient retains the grammatical meaning of a word distorted as a result 
of sensory aphasia. He can properly grasp and retain the etymological structure of 
the word or its grammatical category, while the meaning of the word disappears. 

To Fump—“ Tiger...a jump...of a horse,... It is a verb...it acts... it is 
an action.” 

More often, however, sensory aphasics define the meaning of the grammatical, 
suffixal part of the word with great precision though they do not grasp the pronunciation 
and at the same time the meaning of the root, of the lexical part of the word. 

Energetic—* It is an attribute... I don’t know what it means... It is a man...” 

A small bell /kolo’kolt{rk/ (the ending /t\tk/ means something small like -/et in 
the English word booklet)—“little...can be a glass (/sto’ka:nt\tk/). Can be 
something else.” 

Thus, the subjects are able to perceive the suffixes of words as indicators of its 
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grammatical form, but they can in no way perceive its connection with the given object. 


We, therefore, come to the conclusion that the more generalized (in the given case) 
grammatical meaning of the word is very stable, being less dependent on the 
undifferentiated and inconstant pronunciation. It must be pointed out that even at 
an early stage of recovery, when the derangement of the analysis and synthesis of 
the sounds of speech is particularly pronounced, the perception of the dissociated 
grammatical form proves quite possible. 


Similar law-governed phenomena manifest themselves also in the oral speech of our 
patients. This is testified to by the derangements of the significative structure of 
the words in the course of oral speech which are expressed in so-called verbal 
paraphasias (substitution of words). As revealed by analysis, verbal paraphasias appear 
on the basis of phonetic inconstancy and inability at every given moment to actualize 
the necessary word. The word proves to be an inconstant indicator of the meaning ; 
the limits of the word-meanings are extremely mobile and diffuse.’ Along with their 
undue extension, we observe cases of their limited utilization in the oral speech of 
sensory aphasias. The grammatical characteristics of the word and its connection 
with a definite conceptual category prove to be most “stable” in the process of 
substitution. 


At the same time verbal paraphasia is closely bound up with the motive and 
purpose of the whole utterance. In the present article we shall give illustrations 
only of some forms of verbal paraphasias in the oral speech of our subjects. Among all 
forms of paraphasias (substitutions) there are some in which undifferentiated word- 
meanings due to inconstancy of the sounds of speech constitute the most characteristic 
feature. 

One of our patients told the physician about his failure to write a composition ; 
giving the physician a sheet of paper with a text which was thoroughly crossed out, 
he said: “I have closed it (instead of “crossed it”)...so that you might not...” 


Another patient (a woman) giving the physician a note written by her said: “ At 
last I have hatched a note... (instead of “ written”) you wanted it so much...” 


These examples first of all show that the substitution of a word does not entail 
any change in its connection with a definite grammatical category. The verb is 
replaced by another verb, the noun by another noun, etc. The case, gender and 
other relationships are also preserved. The substituted word remains within the 
general structure of the sentence and allows (to some degree, of course) expression of 
the essence of the thought. 


7 This symptom is dealt with in all researches devoted to aphasia. No matter how the question 
of the nature of this interesting phenomenon is solved by most of the authors, involuntary 
substitutions of words in the speech of aphasic patients are analysed from the point of view 
of relationships between speech and thought. Paraphasia is interpreted as an expression of 
a “round-about way” used in difficult situations connected with the process of the verbal 
realization of thought (Galperin and Golubova, 1939, Lebedinsky, 1941, Goldstein, 1948, Lotmar, 
1913, and others). 
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In the examples cited above the motivation side of verbal expression stands out in 
bold relief. It is precisely this factor which determines the excessive or, as it were, 
accentuated * expression of the sense arising in the substituted word (paraphasic 
word). For example, the words “I have hatched” instead of “I have written” 
express the main feeling of the patient, the tendency lying behind the phrase—the 
difficulty with which the writing of the note was connected, the fact that the patient 
had, so to speak, to “ hatch ” it. The phrase “I have at last hatched a note ” stresses 
the general feeling of the patient—her dissatisfaction with herself for the long and 
strenuous accomplishment of such a relatively simple task. Thus, when a word 
is substituted, the motive and purpose of the verbal expression as a whole may remain 
intact. 


Sometimes a word is replaced not by another word, but by a peculiar phraseological 
expression. Instead of the word history one of our patients said: “ That which revives 
the depth, the shadow of centuries” ; instead of the word although—“ two points 
contrariwise ” ; instead of the words I don’t understand—“ with pure intellect I am 
still unable...” 


All the words entering the given expression are indissolubly connected with each 
other. The sense can be understood only from the whole expression. This integral 
expression, in which separate words are lost, produces the nuance which is dictated 
by the context of the utterance (these expressions are almost never repeated by one 
and the ‘same patient). 


” 


For example, the words “ with pure intellect I am still unable...” express the 
dissatisfaction of the patient with his own intellect. The “ pure intellect” proves to 
be ineffective (the patient understands the speech of other people with difficulty, 
writes badly, etc.). Here, too, the paraphasic phrase replacing the given word expresses 
the significative side of the patient’s thought. 


In some forms of speech activity (for example, in repetitions) we observe word 
substitutions which reveal the feculiarities of verbal generalization in pathological 
conditions. They are connected with different forms of non-differentiability of word- 
meanings. 


Such substitutions may take place within one and the same significative group of 
concepts, and even within one and the same level of generalization: 
instead of “ cock” —“ hen” 
» 9 Shoes” — “ sledges ” 
™ » “knife ” —“ fork ”. 


Sometimes we observe a divergence in the level of generalization between the 
substituting and substituted words: 


8 The term “accentuated” is used by us in a purely conventional way. It relates to the 
affective tendency exhibited by the patient (though not realized by him) in connection with 
the inability to actualize the necessary word, 
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instead of “dog” — “ domestic animal ” 
» 9 ~ tractor” — “ agriculture ” 
» 3 butter” —“ food ”. 

Very often instead of a concrete concept there arises a more generalized one (the 
same was observed in the analysis of the perception of the structure of words). 

Word substitutions are sometimes connected not only with different degrees of 
generalization (within the limits of one concept). They may be also connected with 
diverse significative verbal connections. Sometimes it is some concrete notion, some 
image, usually a habitual one (for example, some substitute for the word bird).’ In 
some cases the substitution can be understood only with the help of intermediate links 
(eyes—“ window” or to buy—“to remember”). Quite often significative verbal 
substitutions are connected with the isolation of one principal property, attribute, 
or aspect of the word. For example, the word plate as a substitute for the word 
ditch could emerge only because one attribute of form (a hollow) was isolated from 
the varied semantic aspects of the word ditch. The substitution of the word fine for 
warm (weather) is also connected with the isolation of the attribute of positive 
quality from the word warm, all other attributes being completely disregarded. The 
same can be said of the paraphasia cold instead of ice, or bed-sheet instead of wall. 
Here, too, the substitution is based on the isolation of only one, definite attribute. 

Thus, the afore-mentioned examples reveal the possibility of a limited utilization 
(on the basis of only one attribute) of verbal meanings in the oral speech of sensory 
aphasics (we observed the same when analysing the peculiarities of the auditory 
perception of words). 

It must also be pointed out that owing to the same general cause of the inconstancy 
of speech sounds in both forms of speech activity, the role of the emotional factor 
and compensatory adaptation of the patient to the grasping and expressing of the 
meaning of the word (which may be lost at any moment) considerably increases. This 
manifests itself both when a word is perceived by ear and when it is used in oral 
speech. 

A syntactic analysis of the system of speech of sensory aphasics at early and late 
stages of recovery shows that not all parts of speech are equally disintegrated. We 
have found that the presence or absence of certain parts of speech in sensory aphasia 
are to some degree law-governed phenomena” (Beyn, 1947, 1957). 


9 This substitution is probably connected with the Russian proverb: “A word is not a 
sparrow, just let it out and you will never catch it again.” 


1 The syntax of the oral speech of a sensory aphasic differs from the nominative material 
system of speech in certain forms df motor aphasia (telegraphic style) which, according to a 
number of investigations, is connected with the disintegration of internal speech with its 
inherent predicative character and which in some way or other testifies to the concretization 
of the thought of a motor aphasic (Luria, 1947). 

The oral speech of a sensory aphasic is to some degree opposite to the under-developed 
speech of a deaf-and-dumb child. The speech of such a child manifests a predominance 
of material words over names of actions; the number of words belonging to other parts of 
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The profound changes taking place in the syntactic system of speech of a sensory 
aphasic in conditions where his mental intention and his intonations are intact, 
differ from any forms of agrammatism so far known in literature. We have already 
pointed out that owing to the inconstancy of the sounds of speech the differentiation 
of the concrete material meanings of words declines. It turns out that material words 
are the first to disappear from the “ phrase” of a sensory aphasic, and that with the 
changed structure of the meanings of words the whole “ phrase” becomes reshaped. 
On the other hand, verbal forms are richly represented in his speech. The syntax 
of the speech of a sensory aphasic is distinguished for its predicative character. 


BRIEF CONCLUSIONS AND DISCUSSION OF RESULTS 


We have emphasized in the beginning of the present article that diametrically 
opposed viewpoints were expressed in literature concerning the peculiarities of 
thought in aphasias in general, and in some of its forms in particular. Some authors 
tried to prove that a decline of categoriality of thought and behaviour is an indispensable 
feature of patients suffering from aphasias. Others, on the contrary, considered the 
absence of any disturbances of thought as an invariable consequence of aphasic 
disorders. It is noteworthy that some authors, while ascertaining certain peculiarities 
of thought in aphasias, sometimes regard them only as externally “ associated” with 
speech disorders. 

The factual material set forth in this article and the results of our study of the 
significative side of speech in sensory aphasia, as well as of some mental operations, 
give us sufficient ground to reject both the alternative formulation of the question of 
whether there exist certain disturbances of thought in aphasias or not, and the approach 
to these disturbances as externally “ associated ” with speech disorders. We have tried 
to prove that sensory aphasia is characterized by a relative intactness of thought as 
a generalizing and purposeful activity. This is testified to by the intactness of the 
visual-imaginative gnosis and practice, peculiarity of behaviour, affectivity, and finally, 
the above-described dissociations of verbal thought observed in the given form of 
aphasia.” 


speech, especially of link-words which indicate relationships, is insignificant. This composition 
of speech conforms to the visual-imaginative thought of a deaf-and-dumb child deprived of 
the auditory perception of speech (Boskis, 1953). 


1 An article by Ahrens (1957) contains data which to some extent coincide with our data; 
analysing a number of drawings of a sensory aphasic, the author shows the peculiar character 
of his intellectual-speech processes (sprache-rationalen). This peculiarity consists in the diffuse 
character of the meanings of words, in their insufficient differentiability. Quite interesting is, 
in our opinion, the idea of the author that schematic and geometrical drawings are more 
accessible to sensory aphasics than drawings of concrete objects, because more abstract forms, 
as it were, “resist” the amorphousness of the optical images retained in memory and the 
diffuseness of speech concepts. 
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We have shown that one and the same dissociation is characteristic of all aspects 
of the significative structure of speech in sensory aphasia ; it is a greater accessibility of 
generalized, abstract and schematized processes in comparison with concrete, material 
processes. We have also shown that this relates not only to a unit of verbal thought— 
the meaning of the word, when it is perceived or used in oral speech—but also to 
the whole syntactic system of the patients’ utterances. This characteristic dissociation 
is caused by the derangement of the phonetic side of speech, by the disintegration of 
the phonemic system of the language in sensory aphasia.” The semantically generalized 
structure of speech of a grown up person which in genesis is indissolubly connected 
with the phonetic structure, acquires, so to speak, a certain stability and becomes 
independent of the phonetic side when the latter is pathologically affected. 


The following fundamental question arises: what role do all these phenomena 
play in the process of verbal thought of our patients ? Can we assert that thought 
remains intact in conditions of pathological phonemic analysis and synthesis, inconstancy 
of the sounds of speech, deterioration of the concrete, material meaning of words and 
modification of their generalizing role ? No, we can by no means assert it. This 
question has already been elucidated by us. 


Owing to the undifferentiated word-meanings and to the variable limits of their 
usage, the process of realizatién of thought, the process of verbal thought in our 
patients, just like their intercourse, prove to be impeded. Dissociation, as a result 
of which the material generalization suffers first, must inevitably lead to a peculiar 
weakening of all other sides of verbal generalization. If the word “bell” is at a 
certain moment perceived by the patient only as “something small,” the word 
“ swallow” only as “a living being”, and the word “sharp ”—only as “a sharp 
question,” the patient’s thought under these conditions cannot be realized successfully 
enough. Only when, as a result of training, the phonetic side of speech recovers both 
in the link of perception and in the link of retention and reproduction of the sequence 
of sounds, the reflection of the diverse objective properties of things and phenomena 
by speech becomes possible. 


With the recovery of the phonetic analysis and synthesis, the grammatical forms and 
verbal generalizations are no longer weakened and cease to play a concrete role 
connected with a certain situation. They begin to serve the direct purposes of 
intercourse and thought. 


In conclusion, it must be pointed out that the basic facts set forth in the present 
article are closely bound up with the theoretical proposition concerining the non- 
conformity of the development of the phasic and semantic sides of speech in early 


® This dissociation is probably due to the fact that in the process of the child’s development 
the differentiation of the phonemic system of speech is closely connected with the discrimination 
of concrete objects of the surrounding world. Perhaps, owing to this, that which is most 
concrete in speech suffers first when the phonetic side of speech is disintegrated, 
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childhood, in spite of all their unity.” This is probably due to the fact that in 
ontogenesis “ speech by its structure is not simply a mirror reflection of the structure 
of thought ” (Vygotsky, 1956), but represents a complex unity, owing to which the 
aforesaid pathological dissociations are quite possible. At the same time pathological 
dissociations reveal the significance of the phonetic side of speech as a basic factor 
and confirm the proposition concerning the close relationships that exist between 
phonetics, morphology and syntax, on the one hand, and thought and speech, on 
the other. 


13 As is known, the available data relating to the development of a child show that the 
significative side of its speech develops in the direction from the whole to a part—from a 
sentence to a word, while the external side of speech develops in the opposite direction—from 
a word to a sentence. 
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A DEVELOPMENTAL CRISIS THEORY OF STUTTERING* 


GERTRUD LASCH WYATT 
Wellesley Public Schools, Wellesley, Mass. 


The relationship between mother and child has been considered as the interpersonal 
matrix within which the development of language in children occurs. Processes of 
successful language learning and of developmental deviations have been analyzed 
within this framework. Stuttering has been interpreted as the result of a crisis in 
language learning coincident with a crisis in the relationship between mother and 
child. The general hypothesis is that a disruption of reciprocal identification between 
mother and child, occurring at the time when the child is in the practising stage of 
early grammatical speech, results in the child’s inability to master language learning, 
expressed in a regression to earlier forms of language behaviour. An experimental 
study was designed to test four operational hypotheses derived from the theory. 
This study has implications for research in symbolic learning occurring within an 
interpersonal matrix, and also for therapy with stuttering children and their mothers. 


THEORETICAL ISSUES 


In this study a “developmental crisis” theory of stuttering has been proposed 
according to which the onset of stuttering in a child’s speech has been interpreted 
as the result of a crisis in language learning coincident with a crisis in the relationship 
between mother and child. 


Stuttering or stammering is a repeated involuntary disruption of the fluency of 
connected speech. Symptoms may range from compulsive repetition of initial sounds 
and syllables, to prolongation of vowel sounds, blocking of speech and involuntary 
accompanying movements of various parts of the body. 


Theories concerning its origin have been put forward since the days of ancient 
Greece. Older theories have been highly speculative, while a more systematic clinical 
or experimental approach has become prevalent during this century. 

Edward Sapir was among the first to stress the socio-cultural determinants of 
language behaviour in men, in contrast to the merely physiological aspects 
of the act of speaking (Sapir, 1921). He warned against studies in which 
language was defined as an entity in psycho-physical terms alone. His emphasis’ upon 


* This report is based in part on a doctoral dissertation submitted to the graduate faculty of 
Boston University, April, 1958. The writer wishes to express her appreciation to Professors 
Chester C. Bennett, Austin W. Berkeley, A. William Hire and Albert T. Murphy for their 
generous help and advice in this investigation. The dissertation is on file at Boston University 
Library under the title : “ Mother-Child Relationship and Stuttering in Children.” A microfilm 
copy of the dissertation is available from University Microfilms, 313 N. First Street, Ann Arbor, 
Michigan, Library of Congress number Mic 58-3130. 
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the interpersonal aspects of language behaviour found further elaboration in the 
psychology of language and communication. During recent years a field theory of 
language has come into being in which the unit of analysis is no longer the individual 
speaker. Observation is focused upon interaction between speaker and hearer, the 
characteristic patterns of this interaction and the psychological relationship between 
the speech partners. Sapir’s work had, however, little effect in the field of speech 
pathology and much of the research in the area of stuttering continued to be done 
on the level of the so-called “ speech mechanisms” only. The literature on stuttering 
has been extensive. Froeschels, in his Lehrbuch der Sprachheilkunde (Froeschels, 
1925), presented a historical survey of European and American theories. Currently 
prevailing theories can be classified in three groups: 

Genogenic Theories. In these theories the stutterer is considered to be biologically 
different from the non-stutterer. The most influential of these has been Orton’s and 
Travis’ theory of “ mixed cerebral dominance ” (Orton, 1937, Travis, 1931). 

Psychogenic Theories. Psychoanalysts and psychiatrists have classified stuttering 
among the psychoneuroses (Blanton, 1936, Coriat, 1928, Fenichel, 1945, Schneider, 
1922), or among the borderline disorders (Glauber, 1944, 1953), stressing the 
predominance of strong unconscious oral-aggressive and anal-sadistic attitudes in 
the stutterer. 

Developmental Theories. In these theories the stutterer is considered not inherently 
different from the normal speaker. The change from normal to abnormal speech 
is seen as a gradual one, related to early disturbing environmental situations mteracting 
with the child’s own feelings of insecurity and anxiety. Froeschels and the “ Vienna 
School ” (Froeschels, 1948), and Johnson and his students at Iowa State University 
(Johnson, 1955) have stressed the normalcy of the repetition of sounds and syllables 
in the speech of the young child. 

The prevailing theories were found unsatisfactory for the following reasons: 

There is very little knowledge of detail or precision as to the nature of the conditions 
which would be significantly related to the onset of stuttering. Such highly non- 
specific factors as excitement, fear, hostility, tension, emotional stress, parental 
attitudes or educational practices have been held responsible for the child’s speech 
difficulty. 

The major weakness of most studies seemed to lie in their lack of conceptual 
schemes for systematizing the data and bringing them to bear on crucial psychologica/ 
problems. Without the prior assumption of a particular set of constructs which 
dictates the selective organization of experience, the number of possible variables to 
be studied is practically unlimited and research tends to become non-specific and 
unwieldy. 

Developmental theorists, who interpreted stuttering as a deviation in child develop- 
ment, have neglected to apply the principles of general validity in genetic psychology 
to its study. Im none of the theories has the nature of continuous change, which 
is inherent in child development, been acknowledged ; nor has the notion of critical 
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periods in development and of the crucial role of timing of developmental disturbances 
been adequately considered. 


RELEVANT PSYCHOLOGICAL THEORIES 


Stone and Church have pointed out that psychological categories change their 
nature and meaning in the course of development, thus becoming qualitatively 
different from one age to the next (Stone and Church, 1957). Werner demonstrated 
that each new and higher stage of development is fundamentally an innovation, not 
merely an addition of certain characteristics to those of the previous level (Werner, 
1957). Piaget has shown that the child’s perception of reality—including perception 
of his mother—undergoes a radical change during the early years of life (Piaget, 1954). 
Piaget also analyzed the successive stages of sensori-motor learning dependent upon 
imitation of a model (Piaget, 1951). Mutual imitation between mother and child 
could be observed during the early stages of language learning. Acts of imitation tend 
to be repeated, but cease with or shortly after the disappearance of the perceived 
model. In the advanced stages of imitative learning, however, sensori-motor patterns 
become internalized and the child no longer depends upon presence of the model ; 
he has become capable of imitating internally a series of models in the form of images. 
(See also Berlyne, 1957.) 

Piaget’s observations agree with studies by Buxbaum, Gesell, Hendrick, and 
McGraw, who observed three phases in the development of motor abilities: the 
reflex phase ; the period of practice, during which the child gives evidence of a need 
to practise repetitively ; and finally, maturity in functioning (Buxbaum, 1947, Gesell, 
1929, Hendrick, 1942, McGraw, 1935). Interference with the learning of an activity 
during the practising stage may lead to the appearance of compulsive repetitions 
(Hendrick) ; it also may be expressed in inhibition, or fixation in the practising phase 
of functioning (Buxbaum). 

Shands, following the stages in the child’s early use of symbols, hypothesized that 
more indirect contact with the love object becomes possible by means of signs and 
symbol functioning, which permits the relationship to remain real to the child at a 
greater and greater distance from the love object (Shands, 1954). S. J. Baker, finally, 
considers reciprocal identification between speech partners as the core mechanism 
operating in all speech relationships (Baker, 1951, 1955). One-sided withdrawal of 
reciprocal identification when the partner is unprepared for it, may have a traumatic 
effect upon the partner and leave him in a state of acute and intense frustration. 

In this study we have tried to demonstrate that previous research into the nature 
of stuttering has been hampered by the application of scientific models of explanation 
which do not fit the interpersonal nature of language processes. Utilizing certain 
concepts and assumptions in genetic psychology, in the psychology of language and 
in psychoanalytic ego-psychology, we have formulated a “developmental crisis ” 
theory of stuttering. 
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THE GENESIS OF STUTTERING 


Stuttering represents a deviation in language development. The “learning of 
stuttering ” is embedded in the normal learning of speech patterns from which it 
deviates at a given point. In order to understand stuttering, its origin and nature, 
we must determine under what conditions, in what manner and at what point in 
the language development of a given child the process of normal language learning 
is being disrupted and development begins to deviate. Thus, a clinical description 
of the “chronology of the pathological structure formation” (Gero, 1953) must 
begin with a description of the normal developmental processes preceding it. 


STAGES OF LANGUAGE LEARNING 


Primary language learning in childhood passes through a series of characteristic 
integrative stages. The change from the newborn infant’s first pre-symbolic cry 
into meaningful concepts, and finally the development of a complex system of 
references takes many years of the individual’s life. In the course of this development 
the child has to cross two important thresholds. The first of these is the change 
from non-symbolic to symbolic speech; the second consists of the change from 
non-relational to relational speech. If we arrange the evolving patterns of speech 
in relation to the two major thresholds, we can map out three integrative stages. 


1. The pre-symbolic stage. The pre-symbolic stage extends from birth to the 
appearance of the first word or consistent phonemic symbol. Cooing and repetitive 
babbling appear as preliminary to the establishment of distinctive sound patterns. The 
production of sounds is primarily auto-erotic and becomes only gradually object- 
oriented (Freud and Burlingham, 1944). 


2. The early symbolic stage. Crossing the first threshold the child discovers that 
everything has a name. Language becomes object-oriented and symbolic. Soon after 
the acquisition of the first words and simultaneous with the development of patterns 
of articulation, the earliest and most primitive modes of relation appear in the child’s 
speech in the form of juxtaposition and of word order. The infinity of experiences 
must undergo simplification and conventionalization, first through the process of 
naming, then through the processes which will set words in mutual relations to each 
other. This gradual emergence of syntactic forms, passing through various preliminary 
stages, finally culminates in the appearance of formally correct sentences. 


3. The early relational stage. With the emergence of syntactic forms the child 
has crossed the second threshold and has reached the early relational stage and the 
beginnings of grammatical speech. The same child, who at the age of 20 months 
used chainlike phrases such as: Fall down bump head, two months later said: May 
I have the broom when I get up? A truly remarkabie development has occurred. 
Through the never ending repetitive use of short phrases with which the child 
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accompanies his activities and experiences—called “ commentary speaking ”—he has 
finally reached more abstract modes of expression. 


REPETITIVE SPEECH AND LANGUAGE LEARNING 


Repetitive speech occurs at all levels of language learning. Repetitions seem to 
occur most frequently at a period when the child is approaching a more advanced level 
of performance and is practising new speech patterns which are more complex than 
those he used at an earlier stage. 

It is important to note the differences in the form of these repetitions. At each 
developmental stage a unit of speech is repeated which is characteristic of the particular 
level. 

At the pre-symbolic level, the child repeats sounds and syllables: Nyam nyam hagliwo 
manamana manayooyoo. 

At the early symbolic level, the child shows almost unending rhythmic repetition of 
words. A child at the age of 183 months, looking at a picture book, said: book book 
buts buts buts horsy horsy sits sits baby book horsy book horsy book—what dat ? 
what dat ? 

At the early relational level, the child repeats phrases. At the age of 263 months 
the same child, looking at a picture book, said: All streetcars, another streetcar, all 
the horses there, another horsy, policeman on the horsy, lady sleeping, lady sleeping, 
lady sleeping in the bed, I see the houses, see the houses. 

Repetitions of characteristic developmental units should be considered as develop- 
mental repetitions. There are, of course, no sharp lines of demarkation between 
different linguistic phases, nor do the repetitions characteristic of earlier speech levels 
disappear immediately with the appearance of more advanced forms of speech. As 
Werner has said: “ Development does not proceed from one definite and permanent 
level to the next; it rather oscillates around relatively stable levels of integration 
reached by an individual at a certain point in time.”’ Temporary regressions to 
earlier levels must not necessarily be interpreted as a signal of learning difficulty 
provided the child is able to advance again without seeming effort to the more complex 
level. It is important to notice that all these varying forms of repetition are produced 
with ease, without any sign of effort and usually with apparent pleasure and playful- 
ness. They are experienced as ego-syntonic. 


THE ONSET OF STUTTERING 


The transition from non-relational to relational speech appears to be a crucial 
period in language development. Linguistic forms of rapidly increasing complexity 
are being mastered by the normal child within a period ranging from six to twelve 
1H. Werner. Address before the Massachusetts Psychological Association, 1955. 
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months—mastered by an organism which is still neuro-physiologically immature and 
psychologically extremely dependent upon environmental support. In order to repro- 
duce complex speech patterns the child’s memory needs constant reinforcement 
through acts of mutual imitation. If the close relationship between mother and child 
which is common during early infancy is disrupted during the period when the 
child is at the threshold of early relational speech, we witness a double crisis, the 
coincidence of an intra-personal crisis—learning of a more complex language structure, 
with an interpersonal crisis—disruption of reciprocal identification between mother 
and child. 


Enforced premature distance from the mother causes the child anxiety concerning 
loss of the mother object (distance-anxiety) and interferes with the learning process. 
Searching anxiously for the primary model or for an acceptable substitute, he becomes 
unable to master the intricacies of a more advanced language pattern and falls back 
upon earlier repetitive forms (regression). Under the influence of his anxiety these 
repetitions acquire compulsive character. 


It is not necessary to assume that a disruption of the mother-child relationship 
must be caused by hostility on the part of the mother. In the majority of cases it 
will be caused by ordinary life circumstances such as physical separation of mother 
and child because of the child’s or the mother’s illness or hospitalization, birth of a 
sibling, illness or death of a family member, moving into another home, or other 
fateful events in the life of a family which cause temporary absence or inaccessibility 
of the mother. Actual rejection of the child and conscious or unconscious hostility on 
the part of the mother may exist in some cases and may cause a more complex 
disturbance of the child’s behaviour with stuttering representing one symptom 
among many.’ 


The diagnostically significant symptom of initial stuttering is the appearance of 
compulsive repetitions which should be clearly differentiated from the developmental 
repetitions described earlier. The linguistic units repeated under the influence of 
frustration and anxiety are no longer representative of the child’s normal stage of 
language development ; they are different in kind from developmental repetitions. 
Compulsive repetitions no longer serve as building stones in the construction of 
larger syntactic units. The child is no longer able to shift freely back and forth 
between simpler and more advanced language patterns. An example of compulsive 
repetitions may sound: “....t ¢ t to throw the co co co clay away....” Such 
compulsive repetitions are experienced as ego-alien by the child. 


* As an example of such a “ mixed case” of stuttering see O. Pollak. Integrating Sociological 
and Psychoanalytic Concepts. New York: Russell Sage Foundation, 1956. (The Case of 
Steven M., pp. 68-104). 
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THE DEVELOPMENTAL CRISIS THEORY OF STUTTERING 


In this theory stuttering was seen as a deviation in primary language learning. The 
theory was expressed in the form of six basic assumptions. 


1. 


The acquisition of language by the child, though dependent upon maturation 
of the organism, is essentially a process of learning through imitation. In this 
learning process the mother (or her substitute) serves as the model for the child’s 
attempts at imitation of the language patterns which are specific for a given 
culture. Thus, primary language learning occurs within an interpersonal matrix 
of mutual imitation and of reciprocal identification between mother and child. 
A continuous, uninterrupted and affectionate relationship between mother and 
child provides the optimum condition for successful language learning in early 
childhood. 

Language learning goes through a series of interrelated stages. Each new and 
higher stage of development represents fundamentally an innovation, not merely 
an addition of certain characteristics to those of the previous level. In individual 
children and under specific circumstances, shifting from a less differentiated to 
a more differentiated stage may produce a crisis in language learning. 
Increasing mastery of the patterns of symbolic language permits the child to 
sustain the relationship with the mother at a greater and greater distance in 
place and time. Once the child has reached the stage of internalizing the basic 
linguistic patterns of the model, he can reproduce them also if the model has 
been absent for some time. Eventually autonomy of the function will be reached 
when the child can dispense with the original model. He will then be increasingly 
able to modify the linguistic patterns at will and will turn to a variety of 
different models (people) for additional learning. The child’s inter-personal 
network of communication changes gradually from a basically dualistic to a 
pluralistic one. 

The child’s relationship to the original model (mother or her substitute) is of 
particular importance during the practising stage of a new function or activity 
or during the period when the child is learning new patterns of a more complex 
nature, prior to the attainment of efficient performance. A disturbance of the 
mother-child relationship occurring at such a critical period will lead to com- 
pulsive repetition of effort, to fixation in the stage of development at which 
the disturbance occurs and to activation of the kind of hostility which the 
child’s particular phase of development puts at his disposal. 

A disturbance in the reciprocal identification between mother and _ child, 
occurring at the time when the'child is in the practising stage of early grammatical 
speech, will result in inability on the part of the child to continue language 
learning successfully, expressed through the initial symptems of stuttering. 
Unable to sustain his identification with the mother at a distance, the child 
feels increasingly anxious and angry at his mother. These feelings of anxiety 
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and anger lead to a gradual change in his perception of the mother, to a further 
and more complex disturbance in his relationship to her and eventually to 
depressive anxieties and subsequent defensive mechanisms and to the symptoms 
of advanced stuttering. 


THE EXPERIMENTAL STUDY: METHOD AND PROCEDURES 


From the theory presented the following general hypothesis was deduced: Once 
stuttering in a child is established we expect to find demonstrable signs of a disturbed 
mother-child relationship. We further expect that in a comparison between stuttering 
and non-stuttering children of the same sex, age and intelligence, we will find that 
the stuttering children differ from the non-stuttering children in the experience of 
this disturbed relationship. 


In order to test the validity of this hypothesis a method was designed which would 
permit investigation of the feelings of stuttering children and at the same time 
permit a demonstration of the difference in the feelings of stuttering and of non- 
stuttering children. 


Three dimensions of the feelings of stuttering children were investigated: 


1. The intense need for physical and emotional closeness to the mother, called 
“ distance anxiety ”. 


2. The intense feelings of anger directed against the mother, called “feelings of 
devaluation of the mother ”. 


3. The intense feelings of helplessness and fears of impending disaster, called 
“fears of disaster ”. 

Three special hypotheses were formulated : 

Hypothesis I. Stuttering children experience intense distance anxiety more frequently 

than non-stuttering children. 

Hypothesis II. Stuttering children experience intense feelings of devaluation of the 

mother more frequently than non-stuttering children. 

Hypothesis III. Children in the advanced stages of stuttering experience intense 

fears of disaster more frequently than non-stuttering children. 

The interpretation of the third dimension called for a subsidiary hypothesis, IIIa: 

The experience of intense fear of disaster tends to be confined to children in the 

advanced rather than in the initial stage of stuttering. 

It was assumed that children would be only partly conscious of their feelings of 
anxiety and anger concerning their mothers and that these feelings could not be 
discussed with them directly. It was, however, assumed that through the use 
of projective tests the children’s feelings could be elicited in the form of phantasies, 
the content of which would be apparent in the form of verbal responses to the test 
stimuli. 
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THE INSTRUMENT 


It was felt that the use of a traditional projective test would provide us with only 
a small segment of the kind of content we were interested in. In order to get a 
sufficient quantity of specific phantasies related to our predetermined set of variables, 
a battery of projective tests was developed, called the Mother-Child Relationship Test 
(MCR Test). It consisted of five pictures taken from Bellak and Bellak’s Children’s 
Apperception Test (CAT), (Bellak and Bellak, 1949), three pictures from Murray’s 
Thematic Apperception Test (T.A.T.) (Murray, 1943), and a story completion test, 
called “‘ Episodes”. The latter contains nine story stems, four of which were taken 
from the Duess Fables (Duess-Despert, 1946), while five were developed for the 
purpose of this study. All children were tested at the source of referal. The test 
session was opened with the administration of the Goodenough Draw-a-Man Test 
(Goodenough, 1926), followed by the MCR Test. Children whose stuttering symptoms 
were so severe that they were unable to produce coherent responses, had to be excluded 
from the experiment. 


SCORING PROCEDURE 


A set of rules for the scoring of the three dependent variables was worked out. 
Each scorable response was scored on three degrees of intensity, namely: 


1. No evidence of the variable found in the response, score 0. 


2. The variable is evident with mild affective loading, sometimes in the form of 
a playful or joking expression, score L (low). 


3. The variable is evident in an intense or excessive form with high affective 
loading, score H (high). 


Responses were scored first by the experimenter, then by two independent judges.’ 


The number of records was 40, the number of scores 1800. The raters reached 
complete agreement on 1634 scores, partial agreement on 154 and no agreement on 12. 
Allowing half credit for partial agreement, the raters reached a 95% agreement. 


The dependent variables chosen were distance anxiety (DA), devaluation of the 
mother (MD) and fear of disaster (FD). The independent variable chosen was the 
language behaviour of the children tested, namely the presence or absence of stuttering 
during the test sessions. In addition, the stage of stuttering was treated as a subsidiary 
variable for part of the study. In each case the diagnosis of stuttering was made first 
by the individual referring the child to the experimenter, then by the experimenter 


8 Dr. Doris Gilbert, Harvard University, and Dr. Donald C. Klein, Wellesley Human Relations 
Service. 
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who applied the diagnostic criteria which were defined earlier in the study. 

The experimental group (E) consisted of 20 stuttering children between the ages 
of 5-6 and 9-6 years, with an I.Q. of 90 or above, determined by the Goodenough 
Draw-a-Man Test. The control group (C) consisted of 20 non-stuttering children 
matched as to age, sex, and intelligence. In order to establish a homogeneous sample of 
stuttering children, care was taken not to include any child with additional pathology 
—be it medical, psychological or social in origin—in the E group. 

Children in the E group were referred from three Eastern public schools systems, 
and from two speech clinics. The children in the C group came from Eastern public 
and private schools. 

The E group consisted of 70% boys and 30% girls, and the C group of 
75% boys and 25% girls. The hypothesis of homogeneity of groups with regard 
to age and I.Q. distribution was tested by Fisher’s ¢ test and was found tenable. 


STATISTICAL TESTS, RESULTS AND DISCUSSION 


According to the theory it was assumed that stuttering and non-stuttering children 
would not differ with regard to experiencing mild feelings of distance anxiety and 
of devaluation of the mother, and mild fears of disaster, expressed in test responses 
to the MCR Test calling for a score of L (low). It was expected, however, that 
stuttering and non-stuttering children would differ in experiencing such feelings in 
intense form, expressed in test responses calling for a score of H (high). For the 
purpose of statistical testing of the hypotheses, the percentage of H scores within 
all scorable responses given to the MCR Test was computed for each variable and 
the raw scores derived from each variable were transformed into proportional scores, 


representing the proportion Go These proportional scores were computed 


for each record as the basis for comparing the E and C groups. 

This study was so designed that hypotheses I, II, and III called for a comparison 
between the E and C groups, while the subsidiary hypothesis IIa called for an 
intra-group comparison between the two subgroups of the E group. 

Hypotheses I, III, and IIIa were tested by a 2 x 2 test of independence, yielding 
a x° with 1 df (1 tail test). The Yates correction for continuity was applied for 
the calculation of x? (Edwards, 1951). Hypothesis II was tested by the Kolmogorov- 
Smirnov Test for two samples (Goodman, 1949). 


Discussion 
The result of the statistical test of Hypothesis I was found to be significant at 


the 0-005 level of confidence ; the result of Hypothesis II was significant at the 
0-1 level only ; the result of Hypothesis III did not quite meet the requirements for 
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significance at the 0-05 level, while the result of Hypothesis Ila was significant 
at the 0-05 level. 

The value of this study should, however, not be determined in terms of quantitative 
results alone. It was felt that a qualitative analysis of the responses given to the 
MCR Test was of equal importance, as it contributed further to our clinical under- 
standing of the stuttering child. 

The results of the statistical test of Hypothesis I warrants acceptance of the 
hypothesis that stuttering children experience intense distance anxiety concerning 
their mothers much more often than non-stuttering children. Through an analysis of 
the original test responses the specific quality of distance anxiety was illustrated. 
Among the responses scored high DA, the most striking ones were those summarized 
under the headings “ Theme of the endless search for the mother” and “ Theme of 
the lost child”. Other symbolic images frequently used by stuttering children were 
of the kind: the door is locked, the house is locked, the child sees his mother behind 
a closed window but cannot get at her, or the mother is moving away in a moving 
van and the child tries in vain to follow her. All these symbols convey strikingly 
the child’s feelings that the mother is inaccessible and that he is helpless in the face 
of insurmountable obstacles. 

Hypothesis II was found tenable on the 0-1 level of significance only. Feelings of 
devaluation of the mother were, in both groups, expressed in a large variety of ways. 
Responses expressing specifically anal hostility—as mentioned by psychoanalytical 
investigators—were negligible in both groups. It was felt rather, that the stuttering 
child experiences separation or distance from the mother—which may be caused by 
life circumstances beyond the mother’s control—as an unfriendly, teasing or hostile 
act on her part to which he reacts with intense anger. Thus, the stuttering child seems 
to be locked in a perpetual difficulty ; the original distance anxiety leads to increasing 
anger and hostility against the mother and the aggressive feelings, while common 
to the whole age group, lead in the stutterer to fear of abandonment as retaliation 
by the mother, thus reinforcing the original distance anxiety. It is the combination 
of distance anxiety and anger which makes the stuttering child differ from the non- 
stutterer, rather than his aggressive feelings seen in isolation.‘ 


* It should be mentioned here that after repeated analysis of the subjects’ responses to the 
MCR Test it was felt that the variable MD (mother devaluated) proved to be less well-chosen 
than the two other dependent variables, DA (distance anxiety) and FD (fear of disaster). 
The responses scored as MD were actually not homogeneous enough to warrant their being 
classified under the same heading. In future studies in which the MCR Test will be utilized, 
at least two specific factors should be differentiated. In one group of responses the children 
expressed predominantly anger, irritation and hostile wishes directed against the mother, while 
in the other group the children’s angry feelings appeared already projected upon the mother, 
who was then perceived by the child as a disappointing, angry, hostile, mean or ugly person, 
a devaluated love-object. In a research study conducted at the present time—supported in 
part by a grant from the Institute of Mental Health, United States Public Health Service—a 
more refined analysis of the responses expressing a child’s negative feelings towards his mother 
is being planned, which should lead to a more differentiated scoring system. 
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Results of the statistical tests of Hypotheses III and IIIa confirmed our expectation 
that children in the advanced stages of stuttering tend to experience intense fears of 
disaster, while such fears are rarely experienced by non-stuttering children or by 
children in the initial stage of stuttering. 


An inspection of the data revealed that the variable FD differed qualitatively from 
the variables DA and MD. Fears of disaster were either experienced at a high degree 
of intensity or they were not experienced at all. Thus, it seemed that with the 
appearance of fears of disaster a crucial change had occurred in the child’s world, 
which should be of prognostic importance. 


The meaning of these findings was discussed in the light of psychoanalytic theories 
concerning the mechanism of depression in children (Bibring, 1953, Erikson, 1950, 
Jacobson, 1946). The developmental crisis theory of stuttering was then modified 
in the following manner: 


The stuttering child has experienced a disruption of the patterns of complementary 
behaviour which are of vital importance for the learning of language. He reacts to 
this experience with anxiety, anger, and with devaluation of the love object. In his 
overt behaviour he alternates between excessive clinging to his mother, expressing his 
distance-anxiety, and aggressive behaviour against her, expressing his frustration and 
anger. He has experienced ego-failure and continues to experience it every time 
when he tries to express himself in words. 


Whether or not these combined experiences will eventually lead to a re-activation 
of the early infantile depressive position, to a devaluation of the whole world and 
to fears of impending disaster, will depend on two interrelated factors. The first one 
will be the child’s degree of anxiety-tolerance, arn aspect of his ego strength, which in 
turn will depend on the severity of earlier disappointments in the mother, experienced 
during the pre-linguistic stage of development. The second determining factor must 
be seen in the mother’s symptom-tolerance, in her reactions to the child’s stuttering and 
to his alternately demanding or aggressive behaviour. The inter-relationship between 
these two factors, the child’s anxiety-tolerance and the mother’s symptom-tolerance, 
will determine the appearance or non-appearance of depressive anxiety and the choice 
of defensive mechanisms. 


In modification of our original theory, we would then expect a high degree of 
similarity in the psychological status of all children in the initial stage of stuttering, 
with acute distance-anxiety as the outstanding and significant psychological mechanism, 
and an increasing diversity in the psychological status of children in the advanced 
stages of stuttering, with feelings of depressive anxiety as a possible but not a necessary 
condition of the stuttering syndrome, and with diversity in the development of 
defensive mechanisms. 
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IMPLICATIONS OF THIS STUDY 


In this study stuttering has been treated as a disturbance in language learning. 
The conceptual approach to the phenomena of language learning differed, however, 
from the behaviouristic paradigm employed in other studies on the subject. Learning 
paradigms derived largely from animal behaviour do not appear to be adequate models 
for the analysis of the complex phenomena of symbolic learning occurring within an 
interpersonal matrix. In particular, the formal structure of language with which we 
are faced in the learning of grammatical patterns, suggests, as Mowrer has put it, 
“a whole spectrum of problems to which the learning theorist has thus far hardly 
given a passing glance” (Mowrer, 1954). 

The primary learning of language in early childhood must be seen as a form of 
learning through imitation. In this study it was demonstrated that the conceptual 
schemata developed by Piaget in his studies of sensori-motor learning through 
imitation of a model, and Baker’s theories concerning reciprocal identification between 
speech partners, provide us with a fitting frame of reference for the study of the 
phenomena of language learning in children and of developmental language disorders. 

In order to evaluate the contribution which this study makes to our understanding 
of language development, a comparison between it and Bowlby’s report of the effect 
of maternal deprivation upon language development, was made (Bowlby, 1952). 
Several basic principles concerning the effect of the mother-child relationship upon 
language development in children were formulated. 

1. If a child has no opportunity to establish reciprocal identification with a 
significant adult or, in more concrete terms, if no continuous model for language 
imitation is available during the pre-symbolic stage, the child will show severe 
retardation in language development. 

2. If the child has already reached the early symbolic stage (stage of naming) 
when he experiences traumatic separation from the mother-model and no acceptable 
substitute is provided, the child will either regress to pre-symbolic babbling or will 
completely give up talking. 

3. If traumatic separation from the mother occurs while the child is in the 
practising stage of early grammatical speech, the child will regress to compulsive 
repetition of sounds and syllables and will develop the symptomatology of stuttering. 

4. In all such cases infantile depression has been reported as a possible later 
sequence to the experience of separation and of speech disorder. 

Further research is needed to elucidate the problem of individual differences in 
the reaction of children at different age levels to temporary separation from the 
mother. In particular, there is a need to study the “ safety limit” or the period of 
time during which children at different stages of development can be exposed to 
separation without experiencing disruption of language development, and the ways in 
which children and mothers cope successfully with temporary separation and distance 
from each other. 

The extent and frequency of depressive anxiety in older children in the advanced 
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stages of stuttering should be investigated together with the development of defensive 
mechanisms and the effect which the mother’s reaction to stuttering in the child 
has upon the choice of these mechanisms. 


IMPLICATIONS FOR THERAPY 


From the theory presented three propositions of importance for therapy were 
deduced: 1. The mother of the stuttering child has to be included in the therapeutic 
process. 2. Therapy with a stuttering child should be initiated as soon as possible 
after the appearance of compulsive repetitions. 3. Therapeutic techniques have to 
be specific for children of different ages and in different stages of stuttering. 

The aim of therapy is the re-establishment of the disrupted reciprocal identification 
between mother and child. The characteristics of “closeness therapy ” for the young 
child and his mother and of “ interpretative therapy” for the older child have been 
briefly developed.’ While these forms of therapy have proved to be very successful 
‘with children in the early stages of stuttering, therapy becomes more time-consuming 
and less promising once the child has developed elaborate defense mechanisms. 
Research and experimentation with stuttering children in puberty and adolescence are 
urgently needed. 


* For a more detailed discussion of the forms of therapy proposed, see Wyatt, 1956. 
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THE PLURAL NUMBER OF NOUNS* 
ANDREAS KOUTSOUDAS 


University of Michigan 


A set of rules has been constructed to resolve the ambiguity of the grammatical number 
of certain Russian nouns. These rules were tested on 2,655 ambiguous nouns taken 
from a 30,152 word text in the field of physics. By making certain modifications, the 
validity of the rules was increased to approximately 99%. 


PROBLEM 


The problem we are presently concerned with is how to identify the grammatical 
number of a Russian noun whenever form alone fails to provide the distinction. For 
example, since the noun BbIpa»keH-HA may be either genitive singular (expression) or 
nominative-accusative plural (expressions), without a change of suffix' (— ua in both 
cases), a set of operational rules is required to enable an electronic computer, when 
translating from Russian into English, to correctly identify its grammatical number. 


PROCEDURE AND RESULTS 
There are three groups of nouns having identical suffixes in the singular and plural 
number. These are: 
I. Feminine, inanimate, palatal and non-palatal nouns ending in —a and — a 
Examples: rasera = newspaper 
MHJIA = mile 
II. Feminine, inanimate, palatal nouns ending in — 5 and — ua 
Examples: kocth = bone 
mpodeccua = profession 
III. Neuter, inanimate, palatal and non-palatal nouns ending in —o, — ne, and —e. 
Examples: BuHo = wine 
BbIpakeHHe = expression 
mMope = sea 
Members of these groups found in.a text? of 15,000 running words were listed 


*Under the author’s supervision, the present study was conducted at the University of Michigan with 
research funds provided by Project Michigan, under U.S. Army Signal Corps Prime Contract Number 
DA-36-039-SC-57654. The study was initiated by The RAND Corporation when the author was 
consulting there in the summer, 1957. 


'By “Suffix” we mean both inflectional and derivational morphemes; i.e., both a “suffix”? and an 
“ending”. 


*The source used for the present analysis was the Zhurnal Eksperimental’noi i Teoreticheskoy Fiziik 
(Journal of Experimental and Theoretical Physics), Vol. 28, No. 1, 1955. Pages 1-59 comprised our 
first sample of 15,000 running words and pages 60-128 our second. 
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alphabetically together with the immediate context of the three preceding and three 
following words. The actual number of each word was identified from the translation, 
and correlations between the actual number and the immediate context were sought. On 
the basis of the correlations observed, a set of operational rules was constructed. These 
rules are given in Table 1. 


TABLE 1 





General Note: It is assumed that the word in question has been already identified as a 
member of the “Noun” class with a specific gender. This word is to be considered 
together with a context unit. A context unit is the word or number of words or 
punctuation mark occurring immediately before and after, not counting: (1) particles, 
(2) personal pronouns, and (3) adverbs of positive degree. Adjectives and participles 
function alike and have been called “‘Adjective Class Members’’. 


Rules 
No. 1. If two nouns are separated by a conjunction, then the second is plural 
if the first is plural. 
Choose the plural translation for any member of the groups I, II, and III if it is 
preceded by: 
No. 2. A plural Adjective Class Member 
3. A punctuation mark or a conjunction preceded by a punctuation 
4. A Plural or Infinitive Verb or a Gerund 
5. A singular Verb and a singular Noun 
No. 6. An idiomatic expression 
7. A numeral other than onuH —, onH— 
8. Any form of — uTo 
9. Two adjectives separated by a conjunction 
No. 10. A Noun followed by 2 or more mathematical formulas (or symbols) 
separated by a conjunction 
No. 11. All the instances to which the above rules do not apply automatically 
become singular. 











A statistical breakdown of the noun groups, the number of cases examined, and the 
number of right and wrong identifications obtained with the aid of the rules is presented 
in Table 2. 

As can be seen, there were 973 singulars and 210 plurals, or a total of 1,183 cases of 
nouns in 15,000 words of text where no distinction could be made as to the grammatical 
number by suffix alone. By applying the rules of Table 1, 1,165 cases (approximately 
98-5°%) have been successfully identified. 
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TABLE 2 





Group Singular Plural Total 





T R W T R W : R W 





I 199 197 2 51 50 l 250 247 3 





II 288 286 2 45 45 0 333 331 2 





III 486 479 7 114 108 6 600 587 13 






































Total 973 962 11 210 203 7 1,183 | 1,165 18 





T = total cases, R = cases rightly identified, W = cases wrongly identified. 


To test further the validity of these rules an additional text’ of 15,000 running words 
was used. 

The members of groups I, II, and III found in the new text were collected and listed 
with a linear environment of three words before and three words after each noun. The 
rules were then tested on the new corpus. The results are given in Table 3. 











TABLE 3 
Group Singular Plural Total 
T R W T R W T R W 
I 321 319 2 71 64 7 392 383 9 





II 483 479 4 31 31 0 514 510 4 





III 472 | 467 5 | 94 86 8 566 | 553 13 
































Total |1,276 | 1,265 11 196 181 15 1,472 | 1,446 26 











In the new text, there were 1,276 singular cases and 196 plural, or a total of 1,472 
ambiguous cases. By applying the rules of Table 1, 1,466 cases (approximately 98-2%) 
were correctly identified and 26 cases (approximately 1-8°/,) were wrongly identified. 


3See footnote 2. 
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Finally, some of the rules were modified slightly and retested on the entire 30,152 
running words of text. The revisions and the final results are given in Tables 4 and 5. 


TABLE 4 





The following changes were made in the original assumption and rules shown in 
Table 1: 
1. General Note: Include: the pronouns eé and ero. 
2. Rule 3: Open and closed quotation marks are not considered punctuation. 
3. Rule 6: Include: only the following cases will be considered as idioms: 
(a) Mexkay coGoii, (b) Takum o6pa3om, (c) urparoT pou, (d) B 3TOM 
cMBIcuI¢e, (€) IpHHHMas BO BHHMaHHe, (f) Torja Kak and (g) Tak 





























Kak. 
4. Rule 8: Include korya and T.e. 
5. Rule 10: Include: or a comma after “ . . . by a conjunction”. 
TABLE 5 
Group Singular Plural Total 
T R W T R W T R W 
I 520 518 2 122 188 4 642 636 6 
II 771 766 5 76 76 0 847 842 5 
Ill 958 952 6 208 200 8 |1,166 [1,152 14 
Total {2,249 |2,236 13 406 394 12 |2,655 {2,630 25 






































The final results show that there were 2,249 singular cases and 406 plural, or a total of 
2,655 cases in 30,152 running words of text. With the aid of the modified rules, 2,630 
cases (approximately 99°/,) were correctly identified. In other words, the computer can 
identify the grammatical number of any member of groups I, II, and III (singular or 
plural) approximately 99 per cent. of the time. The remaining error of 1°, was primarily 
due to mispunctuation and to the lack of patterning whenever a noun preceded a member 
of the groups examined. More data will be analyzed to continue testing the applicability 
of the rules and to improve them where possible. 
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THE EFFECTS OF EARLY DEPRIVATION ON SPEECH 

DEVELOPMENT : A COMPARATIVE STUDY OF 4 YEAR 

OLDS IN A NURSERY SCHOOL AND IN RESIDENTIAL 
NURSERIES 


M. L. KELLMER PRINGLE AND MARGARET TANNER 
University of Birmingham 


Two groups of pre-school children, matched for age, sex, intelligence and home 
background, were given a battery of verbal tests and their conversations were recorded 
during periods of free play. Data were obtained on vocabulary and sentence structure 
under controlled and spontaneous conditions, on the children’s ability to understand 
and express themselves in simple sentences and on verbal expression in social 
intercourse. In all quantitative comparisons, the nursery school children were found 
to be in advance of the children in residential nurseries. Among qualitative differences 
noted, was the extent to which speech was used for establishing social contacts with 
contemporaries and for obtaining adult attention. Although there was considerable 
overlap in the achievements of the two groups, our evidence confirms that there 
is some retardation in the language skills of pre-school children in care. 


INTRODUCTION 


The effect of early maternal deprivation on children‘s development has been the 
subject of many studies in recent years. There is a good deal of evidence to show 
that the various aspects of speech tend to be most seriously affected. A child’s 
language development reflects the level of his intellectual, emotional and social growth. 
Though interrelated, these aspects can to some extent be studied separately. The 
gradual mastery of vocabulary, of sentence structure and of the logical expression” 
of ideas is an indication of the extent to which the growth of intelligence has 
been influenced by experience. Using speech for making contact with adults and 
contemporaries reflects its social aspect. Verbally expressed feelings—whether in 
reality or phantasy situations—give some insight into the child’s emotional life. 
Thus “the act of speech is a meeting ground for functions and activities of the 
organism (mental and physiological) at all levels. Speech production is achieved 
through the co-ordination of muscular, respiratory and neural activities on the one 
hand, and of cultural, intellectual and emotional factors on the other” (Goldman- 
Eisler, 1958). 


This investigation aimed at a quantitative and qualitative analysis of the differences 
(if any) in speech development between pre-school children in residential care and 
those living with their own families. Answers to four questions were sought: 
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a) are pre-school children in care retarded in their language development ? b) If so, 
are all aspects equally affected and in what specific ways? c) Even if retarded, is 
their speech nevertheless developing along normal lines ? d) Are there any differences 
between the two groups in the use of speech for social intercourse ? 


PREVIOUS STUDIES 


All available results indicate that deprivation exerts the most adverse effect on 
speech development. Bowlby, reviewing recent literature, concludes that “ the least 
affected is neuromuscular development, including Walking, other locomotive activities 
and manual dexterity. The most affected is speech, the ability to express being more 
retarded than the ability to understand” (Bowlby, 1951). How early the effects of 
environmental deprivation are manifested is shown in an investigation of new-born 
infants (Gatewood and Weiss, 1930). Given various stimuli such as light, sound, 
smells and temperature, neonates vocalised much more than in situations where they 
were “allowed to lie naturally without any external stimulation.” Brodbeck and 
Irwin (1946) compared the frequency and variety of phonemes uttered by a group 
of orphanage children with those heard among a group living with their own families. 
Statistically significant differences were found in favour of the latter as early as 
the first two months. Freud and Burlingham (1954) reported retardation in language 
development during the second year of life. 

Williams and McFarland (1937) applied a vocabulary test to 64 orphanage children 
and compared them with a large group of children living in their own homes. The 
latter had a markedly superior vocabulary, much more so than could be accounted 
for on the basis of I.Q. or socio-economic level. Moore (1947), using the same 
vocabulary test, analysed two-minute samples of oral language of orphanage and non- 
orphanage children. Again, the former group was markedly retarded. Holding C.A. 
and M.A. constant, analysis of variance showed a statistically significant difference 
attributable to environmental influences. Goldfarb (1943 and 1945), in a comparative 
study of institutionalised and fostered children, investigated speech sounds, intelligibility 
of speech and level of language organisation at three age levels: in early infancy, 
at 6-8 years and in adolescence. At each age level the institution children showed 
marked language deficiency in all the areas measured. 

That children in orphanages and institutions are seriously retarded in vocabulary 
and language development has been shown in a number of other investigations 
(Little & Williams, 1937; Skeels et al., 1938; Flemming, 1942). Although such 
children undoubtedly come from lower socio-economic levels, are somewhat below 
average ability and probably have restricted environmental experiences, their retardation 
appears to be so marked that it is necessary to look for additional causative factors. 
Even when matched with ordinary children for mental age, the deprived were much 
more retarded in vocabulary (Little & Williams, 1937). If association with adults 
facilitates language development, institutionalized children may show marked retardation 
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because they associate most with other children, especially contemporaries ; on 
the other hand they have far fewer individual contacts with, and attention from, 
adults (McCarthy 1930 ; Smith 1935). 


Dawe planned a training scheme with a group of orphanage children aiming at 
increasing their understanding and use of language symbols (Dawe 1942). Matching 
eleven pairs of children for age, sex, school group, M.A., I.Q. and score on the 
Smith-Williams vocabulary test, one member of each pair received the training and 
the other served as a control. About 50 hours of individual and small-group teaching 
was given, providing the kind of enriching experiences children receive as a matter of 
course in good, educated homes. The experimental group showed significant gains 
(at the 1% level of confidence) in most language measures ; moreover, these language 
gains were reflected in an increase in average I.Q. from 80.6 to 94.8 as a result 
of the 50 hour training which was spread over a period of three months. During 
the same period the control group decreased slightly from a mean I.Q. of 81.5 to a 
mean of 79.5. The implications of these findings could be quite far reaching not only 
for deprived children, but for education generally, if they can be substantiated with 
larger groups. 


Confirming the finding that infants brought up in institutional environments show 
severe developmental retardation, particularly in linguistic ability, Roudinesco and 
Appell (1950) introduced a change of regime; this was designed to provide the 
children with more individual attention from the nurses and other attendants. A 
re-test after a period of 18 months showed considerable gains in motor, social and 
adaptive behaviour, but the least improvement was brought about in language 
development. Gesell and Amatruda (1947), discussing the dynamics of environmental 
deprivation, stress that it operates by “ attrition” as well as by “‘ impoverishment ” 
and that the results tend to be cumulative. The more monotonous and impersonal 
character of an institutional environment appears to reduce early vocalisation in 
contrast to the stimulation provided by the variety and warmth of normal family life. 


The beneficial effect of regular contacts with adults outside the Children’s Home 
was shown in a recent study of 8, 11 and 14 year olds (Kellmer Pringle and Bossio, 
1958). Backwardness in language development was least marked among those children 
who since their removal from home had maintained a continuing relationship with 
a member of their family or a family-substitute. This supports the view (McCarthy 
1952) that language development depends to a considerable extent upon the child’s 
identification with his mother. When there is a close contact with her or a mother- 
substitute, the child continues to strive to communicate his thoughts and experiences ; 
personal interest and continuity of contact seem to provide the motivational forces 
needed to stimulate this learning process. 
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THE SUBJECTS 


One nursery school and three residential nurseries were used to obtain the required 
number of children. It was decided to match the two groups for age, sex, intelligence 
and home background. To eliminate the complicating influence of mental dullness, 
only those whose level of tested intelligence was at least average, were chosen. 
Altogether 50 children were given a screening examination which made it possible 
to obtain 18 matched pairs. The age, sex and I.Q. distributions are shown on Tables 1 
and 2. The difference between the mean I.Q’s is not statistically significant (t = 1.4). 
Henceforth the nursery school group will be referred to as Group N and the 
children in the residential nurseries as Group R. 


TABLE 1 
Group N Group R 

Age Boys Girls Boys Girls 
Under 4 years — 1 1 _ 
4 yrs. to 4.5 5 6 2 a 
4.6 yrs. to 4.11 5 1 7 + 
Total 10 8 10 8 
Mean age 4 years 4 months 4 years 6 months 
S.D. 3.3 months 4.1 months 


Age and sex distribution in the two groups of children. 


TABLE 2 
M.A. range Group N Group R 

3.6 to 3.11 0 1 
4.0 to 4.5 l 3 
4.6 to 4.11 3 3 

5.0 to 5.5 8. 6 

5.6 to 5.11 3 1 
6.0 to 6.5 3 4 

Group N Group R 

1.Q. range Boys Girls Both Boys Girls Both 
90 - 109 5 _ 5 4 4 8 
110-129 5 3 8 3 3 6 
130 - 150 — 5 5 3 1 4 
Total 10 8 18 10 8 18 
Mean I.Q. 120 113 

3.D. 15.2 14.8 


Mental age and intelligence quotients on the Merrill Palmer Scale. 
Note. In calculating the mental ages and intelligence quotients, the verbal items of the scale | 
were omitted although they had been administered. These items are considered separately 
(see page 280). 
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With regard to home background, only a broad similarity proved to be a practicable 
aim. It is known that the great majority of children in care come from families 
where educational and cultural standards are low. Prior to separation most of them 
lived in homes where verbal stimulation is minimal. Over-worked and under-privileged 
mothers, often burdened with too many pregnancies or forced by economic necessity 
to go out to work, have little time and energy available to encourage the baby in 
his early experiments with sound and to elicit continued trial and effort by taking 
delight in his pre-speech vocalisations. Similarly, once the child is beginning to 
speak, there is likely to be less verbal stimulation in the form of nursery rhymes, 
stories, songs and general conversation. Thus it could be argued that most deprived 
children are already backward in language development before coming into care. 


A nursery school was selected therefore, which catered for children whose home 
background was likely to be similarly impoverished and unstimulating. The one 
chosen admitted only “hardship cases”, such as the children of unmarried or 
deserted mothers, or widows, or where there was severe overcrowding or serious 
illness in the home. Even so, the neighbourhood is such that the nursery school has 
a waiting list. Thus one would expect to find little difference between the language 
development of the two groups of children if the main adverse influence were an 
unfavourable home background. Moreover, half of the deprived group came into care 
before the age of 18 months (see Table 3). It could be argued that their linguistic 
ability had been fostered predominantly by the life in residential nurseries. 


TABLE 3 
Before 6 months 5 
7-12 months 1 
13-18 months 2 
19-24 months Fs 
25 - 36 months 3 
37 - 48 months 5 


Age at which children in Group R came into care. 


THE EXPERIMENTAL PROCEDURE 


Each child was given at least two individual interviews in a small room known to 
him and in addition his normal free play in the nursery was observed. In a few 
cases it proved necessary, in order to obtain and maintain fuil co-operation, to give 
several shorter interviews or to administer the tests in a quiet corner of the playroom 
because unusual surroundings seemed to have a disturbing eXect on some of the 
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more anxious children. Inevitably the amount of preliminary work necessary to 
gain a child’s confidence varied considerably. With the most difficult children the 
formal testing had to be postponed until regular visits had been paid to the nurseries 
over a period of months. 


It was aimed to explore the speech development of the two groups in as many 
directions as possible with such young subjects. Firstly, two formal aspects of speech, 
vocabulary and sentence structure, were investigated both under spontaneous and 
controlled conditions. Vocabulary was assessed by the following means: 

(a) the Picture Vocabulary Test from the Terman Merrill Intelligence Scale, Form L, 
requiring object naming. 

(b) the Vocabulary sub-test from the Wechsler Intelligence Scale for Children, 
requiring definitions. 

(c) the vocabulary used in free play. 


Secondly, the children’s ability to understand and to express themselves in simple 
sentences, in response to both structured and unstructured test items, was assessed 
by the following tests: 

(d) the appropriate verbal items in the Merrill Palmer Scale for Pre-school Children, 
namely simple questions and action agent. 

(e) the Watts English Language Scale, involving the use of basic sentences and the 
ability to describe pictures. 


Thirdly, spontaneous and undirected verbal expression was assessed by recording the 
children’s conversation during periods of free play, supplemented by careful observation 
of the accompanying behaviour. Each child was observed for a minimum of half 
an hour playing with preferably only one other child ; if more children were taking 
part, the time allowed was proportionately greater. Since the play had to be entirely 
spontaneous and undirected, the total observation time was in many cases made up 
of a number of shorter periods since at this age social and group play is still fluid 
and frequent changes of partner and activity occur. The following general conditions 
were adhered to: 

1) Speech was not recorded until the children were sufficiently accustomed to the 
experimenter to be able to play naturally and undisturbed by her presence. 

2) If more than four children joined in, observation was discontinued since in a 
larger group more than one conversation was often carried on at once, thus making 
accurate recording difficult. 

3) Sufficient play material was provided to give a choice of activity. 

4) There was no adult participation. If a child spoke to the experimenter, the 
latter, though friendly, would give the minimum response and explain that she had 
some writing to do just now. Strict impartiality was observed when an appeal was 
made to her to arbitrate in a quarrel. 

5) Every word spoken by the children was recorded for later analysis. 
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DISCUSSION OF THE FINDINGS 


General Observations. 


Different attitudes and reactions to the tests and interviewing procedures were noted 
among the children in the nursery school and those in the residential nurseries. In 
the former, the children responded readily when spoken to and in turn initiated many 
a conversation. When playing, all their energies and attention seemed to be absorbed 
so that it was easy for their activities to be observed and their speech to be recorded. 
On the other hand, invitations to accompany the investigator into a separate room 
(for the individual testing) met in some cases with a refusal at first ; this was due 
not so much to shyness or lack of co-operation, as to a healthy preoccupation with 
their own activities, indicated by remarks such as “I’m building a house” or “ we 
are going to have music now and it is my turn to be in the middle”. Eventually 
all but one very anxious boy came along willingly and then the nature of the test 
material held their interest. In the residential nurseries the problem was reversed. 
Invitations for the individual session were always met most readily ; in fact, there 
was competition among the children to be singled out in this way, and to obtain 
the undivided attention of an adult their own play activities were willingly abandoned. 
On the other hand, the observation and recording of spontaneous and undirected 
speech proved rather more difficult. In the residential nurseries the presence of a 
visitor seemed to have a disruptive effect. The children stopped their activities when 
the investigator came into the room, watched her movements and then gathered 
around her. Thus it proved necessary to arrange a situation where spontaneous 
speech could be observed. Two or three children known to be friendly with each 
other, were taken into a room with a generous supply of play materials and the 
experimenter played with them for a short while. Then she explained that she had 
some writing to do and the observation period began. Even with these arrangements 
it rarely happened that the children became so absorbed in their play as to entirely 
ignore the investigator’s presence. 

In both groups there were a few shy children but they reacted differently: in the 
nursery school they showed some active resistance to leaving their group ; in the 
residential nurseries they accompanied the investigator readily enough though remaining 
passive and refusing to talk during the early part of the interview. Perhaps it need be 
added here that the nursery school and one of the residential nurseries were excellently 
equipped ; the other two residential nurseries seemed to have a rather inadequate 
supply, especially of materials suitable for imaginative and creative play. 


Tests and Play Observations. 
1. Vocabulary. 


(a) The Picture Vocabulary Test of the Terman Merrill Intelligence Scale, Form L, 
consists of 18 pictures of common objects, such as a shoe, a cup and a house. “ The 
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purpose of this test is to determine whether the sight of a familiar object in a picture 
provokes recognition and calls up the appropriate name ” (Terman and Merrill, 1937). 
Table 4 shows the responses of the two groups. 


TABLE 4 

No. of words Group N Group R 

correctly named 
9 words 
(23 year level) 18 18 
12 words 
(3 year level) 18 16 
15 words 
(35 year level) 16 9 
16 words or more 
(4 year level) 12 7 


Terman Merrill Picture Vocabulary. The mental age equivalents assigned by Terman and 
Merrill are given in parentheses. 


From the 3 year level onwards, Group N received higher scores and the mean mental 
age level achieved by this group was 3 years 9 months as compared with a mean 
level of 3 years 4 months for Group R. Both groups tended to find the same words 
difficult. However, Group N obtained at least 4 more points that Group R on the 
following words: basket, glasses, umbrella and pocket-knife. This difference may be 
due to the fact that they represent personal possessions of a type less familiar to 
children in care. 

(b) The Vocabulary Sub-test from the Wechsler Intelligence Scale for Children 
consists of 40 words of increasing difficulty for which the child is required to supply 
the meaning in his own words. The ability to define an object demands verbal skill 
considerably greater than that required for mere object naming. In Table 5 are 
shown the standard scores received by the two groups. No child succeeded beyond the 
first 16 words. Again, Group N made a better showing and obtained the mean 
score expected from an average 4.8 to 4.11 year old child ; the mean score of Group 
R was below that of an average 4 year old. 


Qualitative comparison. The following words tended to evoke different responses in 
the two groups: bicycle, knife, letter, donkey, fur, nuisance and brave. 


i. pees All but one of Group N gave an active definition in terms of “ ride it ” 

“ pedal”. 8 of Group R gave attributes such as “ wheels ”, “‘ goes ting-a-ling ”. 

ii. KNIFE. 12 of Group N and 13 of Group R defined it ~ terms of “cut”; in 
addition 5 of Group N defined a knife in terms of “ to kill someone ”. 
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TABLE 5 
Standard 
Score Group N Group R 
4 18 15 
6 18 14 
8 18 11 
10 17 9 
12 13 7 
14 10 6 
16 4 2 
18 2 1 
20 2 0 


Wechsler Intelligence Scale for Children Vocabulary.’ 


9 


iii. LETTER. All of Group R and half of Group N gave definitions of the “ posting ’ 
variety ; the remaining half of Group N defined a letter as “ what you write”. 

iv. Donkey. All but one of Group N defined it in terms of “ride”. 5 of Group R 
gave similar definitions, but 6 gave descriptive definitions ; e.g. “goes cl-cl”, 
“he bites ”. 

v. Fur. 2 of 3 definitions given by Group R were “on a cat”, also 3 of 6 
definitions by Group N. In addition 3 of Group N defined fur in terms of 
personal use, e.g. “ What you put on a cowboy suit ”. 

vi. Nuisance. Group N: 4 definitions involving “ naughty ” ; 2 definitions involving 
“smack”. Group R: 5 definitions involving “crying”; 2 definitions involving 
“naughty” Group N mentioned various types of aggressive naughtiness, e.g. 
“getting mad”, “swearing”. Only Group R mentioned “crying” and 
“ moaning i 

vii. BRAVE. Group N: 2 definitions only: “ kill every body” and “go near things 
like tigers”. Group R: 2 definitions only, both “ don’t cry ”. 

Although many definitions given by the two groups were in similar terms, the 
differences described above seem to be significant. The definitions given by Group N 
for the first 5 words tended to be in terms of active participation, compared with a 
more passive mode of description on the part of Group R. Thus bicycles and 
donkeys are ridden, knives are linked with the expression of aggressive impulses and 
letters are written and not merely posted. Insofar as definitions were attempted at 
all of the last two words, Group R had a more negative and passive conception of 
goodness. These differences suggest that Group N tended to give more definitions 
which were linked with and arose out of vital personal experiences. 

(c) Vocabulary used in free play. This was analysed into the main grammatical 
components of speech. As can be seen from Table 6 there was little difference between 
the two groups except that Group N used a greater proportion of verbs and adjectives. 


1 For norms, see Report of the Committee of Professional Psychologists on the Wechsler 
Intelligence Scale for Children, Appendix B. 
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This may reflect their more developed descriptive power, a more active outlook on 
life and a rather more mature sentence form. There was a considerable overlap in all 
the categories, as can be seen from Table 7. With the exception of nouns, Group R 
used comparatively few words not employed by Group N. The total number of words 
used by the latter was 5,039 and by Group R 3,984. The ratio of the average 
number of different words to the average number of total words was practically the 
same for the two groups (0.39 for Group N, 0.41 for Group R). While these results 
give little indication of the total speaking vocabulary of the children, they give some 
information on their relative speech development. 


TABLE 6 
Group N Group R 
Nouns 268 209 
Verbs 209 107 
Adjectives 84 57 
Adverbs 24 19 
Conjunctions 9 5 
Pronouns 24 22 
Prepositions 13 9 
Miscellaneous 26 25 


The number of different words used by the children. 


TABLE 7 


No. of Words used by Words used only Words used only 
Group N and Group R_ by Group N by Group R 


Nouns 114 154 95 
Verbs 84 125 23 
Adjectives 35 49 22 
Adverbs 15 9 4 
Conjunctions 5 a ~ 
Pronouns 20 4 2 
Prepositions 9 4 — 
Miscellaneous 16 10 9 


The number and type of words common to the two groups and peculiar to each group. 


i. Nouns. In Table 8 are listed the words used exclusively by at least 4 children in one 
group. It is perhaps somewhat unexpected that the nouns dog, door and window were 
not mentioned by any child in Group R. The only nouns used exclusively by at least 
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TABLE 8 
No. of children Group N Group R 
using word 
8 dog — 
6 door, window — 
5 home a 
a castle, policeman puzzles, shop 


Words used exclusively by one group. 


4 children in group R were puzzles and shop, words somewhat devoid of emotional 
content and descriptive of toys in their play room. Although one cannot base any 
conclusions on such slender evidence, it is tempting to speculate whether to Group R 
doors and windows are not exciting, policemen are neither feared nor admired, 
castles are not dreamed of, and dogs are never proudly owned. 

An analysis of the nouns, common to both groups, but used to a markedly greater 
extent by one or the other, showed again a tendency for Group R to dwell on their 
actual playthings (Table 9) ; in contrast, the common use of the words car and Mummy 
by Group N suggests that their play activities are more related to experiences outside 


the nursery. 


TABLE 9 
No. of children using 
each word 

Group N Group R 
flowers 6 1 
car 9 2 
Mummy 13 1 
ball 7 13 
toys 2 & 


Nouns common to both groups. 


ii. Verbs. An analysis was made of the verbs most frequently used and a very 
similar range of verbs was found in the two groups. However, the verbs used 
exclusively by Group N tended to be more active and agressive ones, whereas those 
of Group R suggested a rather docile, helpful attitude (for example, dance and push 


as against mend and worry). 








280 The Effects of Early Deprivation on Speech Development 


iii, Adjectives. Here too, the range used by the two groups was very similar. 
Except that Group N used a greater number of adjectives, there were no suggestive 
differences in the adjectives used exclusively by either group. 


2. Ability to understand and to use simple sentences. 


(d) Verbal items in the Merrill Palmer Scale for Pre-school Children. In the 
“ Simple Questions ” test, the ceiling of which is at the three year level, both groups 
did equally well. In the more difficult “ Action Agent” test Group R attained a mean 
score of 9.2 as compared with 13.6 of Group N (total possible score 20). The number 
of children succeeding at the different age levels assigned to this test by Merrill 
Palmer is shown in Table 10. Qualitatively, the responses made by the two groups 
were very similar except for the question “what swims”: all but one of the 15 
Group R children who answered correctly, replied “fish” or “ducks”; of the 16 
Group N children replying correctly, half gave the same answer but half mentioned 
“men,” “people”, “ boys”, etc. Only one child in Group R mentioned a human 


being. 


TABLE 10 


No. of children giving 
correct responses 


Age level Group N Group R 
2 year 17 14 
3 year 16 9 
34 year 14 6 
4 year 14 6 
43 year 10 4 
5S year 5 0 


Response level on Action Agent Test. 


(e) Watts English Language Scale. “'The pictures accompanying the scale were 
designed to measure, in so far as it is measurable, the progress of young children 
in mastering the basic varieties of the English sentence. By means of the scale... we 
should be able to decide with some approach to accuracy how one child compares with 
another of the same age in one important aspect of his linguistic development ” 
(Watts, 1944). The scale consists of 36 pictures arranged in six groups of increasing 
complexity. The language ages scored by the children in each group are shown in 
Table 11. Group N did considerably better than Group R, achieving a Mean Language 
Age of 5 years 4 months, while the latter’s was 4 years 7 months. The difference 
between the groups is most marked at the higher level where only one child in 
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TABLE 11 
Lan Ages No. of children 
ies Group N Group R 
4.0 to 4.5 2 8 
4.6 to 4.11 - 6 
5.0 to 5.5 3 3 
5.6 to 5.11 5 1 
6.0 to 6.6 4 0 


Watts Language Ages. 


Group R obtained a score above the median score of Group N. The correlation between 
the Watts Language Age and the Merrill Palmer M.A. was positive but low, 0.41 for 
Group N and 0.23 for Group R. 

Every comment made by each child during this part of the interview was also 
recorded verbatim. A comparative qualita‘ve analysis of these spontaneous remarks 
made while working on the Watts Scale showed that Group R made twice as many 
such comments. They seem to fall into six groups: 

(1) Questions concerning the content of the picture, e.g. “where are the children 
going ?” 

(2) Interpretation of motive and elaboration of the story, e.g. “ The cat thinks the 
children will chase him, so he won’t come down the tree ”. 

(3) Comments to attract attention, e.g. “look at it”. 

(4) Comments anticipating the next picture or activity, e.g. “I want to see the 
next one”. 

(5) Comments concerning the room, test material, etc. For example “ This is the 
school room ”. 

(6) Self-references, e.g. “I have a baby at home”. The type of comments made 
by each group are shown in Table 12. The differences found between the two groups 
are very suggestive. The children in the residential nursery made more attention 
seeking and anticipatory comments and asked more questions about the content of 
the pictures. Maybe they enjoyed receiving individual attention from an adult so 
much that they were anxious to prolong this opportunity. Group N did not apparently 
share this need to make the most of the situation. On the other hand, Group N’s very 
markedly greater proportion of self-references suggests that these children are not 
only more accustomed to talking with adults about their own activities and possessions, 
but also assumed adult interest in their affairs. The other type of comment made to 
a greater extent by Group N was a reference to the motives of people or animals 
shown in the pictures. Probably this was due to a greater facility of the nursery 
school children to identify themselves with the events depicted in the test. This 
greater readiness to identify may also be related to the superior language development 
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of Group N, since identification promotes imitation which itself is the basis of speech. 
There may also be a link here with the results obtained from the Wechsler vocabulary 
test where Group N more often gave definitions indicating active participation. 
Moreover, of Group R’s 19 interpretative comments, 9 implied naughtiness on the 
part of the children, such as “ Someone’s been bad, policeman want to send them 
to the police station and lock them up so they can’t come out”. In view of the fact 
that there was nothing in the illustrations to evoke ideas of badness or punishment, this 
would seem to be of some significance ; all interpretative comments of Group N arose 
naturally from the pictures. 


TABLE 12 
Group N Group R 

Type of Comment N % 
Questions about content 7 10% 38 86.25% | 
Interpretations of motive 18 24% 19 13% 
Comments to attract attention 6 8% 42 28% 
Anticipatory comments 6 8% 34 22% 
Comments on testing room 

and materials 9 12% 14 10% 
Self-references 28 38% 3 2% 
Total 74 150 


Spontaneous comments during work on the Watts Scale. 


3. Spontaneous and undirected speech. The verbatim records made during the free 
play sessions were analysed from three points of view: the incidence of (a) egocentric 
as against social use of language ; (b) mature forms of conversation and (c) phantasy 
and humour. 

(a) The incidence of egocentric versus socialised speech. From being primarily an 
individual activity, speech becomes the major tool for social intercourse. Piaget (1926) 
attempted to measure the degree to which egocentric speech gives place to socialized 
speech. He found that at any given age, the proportion of egocentric to 
other spontaneous forms of language was approximately constant. The degree of 
egocentricity could be measured, he suggested, by the coefficient of egocentrism = 
egocentric language 





. The two main characteristics of egocentric language are 


total spontaneous language 
that it is not addressed to anyone and usually does not evoke a response from others. 
The coefficients of egocentrism were calculated and though it was higher for Group R 
the difference was not statistically significant (0.387 for Group N and 0.510 for 
Group R). 
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Next, the children’s remarks were analysed into three categories: (i) those directed 
to the child(ren) with whom the speaker was playing, (ii) to the investigator and 
(iii) non-directed ones ; for example, looking at some pictures a child would murmur 
“look at this”, while making no effort to attract anyone’s attention nor looking 
whether anyone was heeding his remark. As can be seen from Table 13 the total 
number of remarks made by Group N is somewhat larger. But the more marked 
difference between the two groups was in the proportion of remarks directed to other 
children. The considerably greater number of child-directed remarks made by Group 
N is in fact a reflection of the way in which these children played together. Many in 
Group R could hardly be said to be playing together at all, whereas in Group N 
a good deal of co-operative group play was observable. A small distorting factor was 
undoubtedly introduced by the presence of the investigator and this had a more 
distracting influence on Group R. At the same time Group N showed a degree of 
absorption in their play equalled by only a few children in Group R. 


TABLE 13 
Group N Group R 
Type of remark - N % % 
Child-directed 596 56% 358 37% 
Adult-directed . 247 _.23% .- 315 32% 
Non-directed 226 21% ~# 301 31% 


Direction of children’s. comments. 


(b) The incidence of maturer forms of conversation. Using a simplified version of 
Piaget’s classification of socialised speech, it was attempted to analyse the children’s 
dialogue for examples of more mature forms of conversation. The following headings 
were chosen: 

i) collaboration in action or non-abstract thought 

ii) collaboration in abstract thought 

iii) clash of opinion 

iv) arguments, differing from. (iii) in that reasons were put forward for the 

differing points of view 
Though the first mentioned type of conversation occurred most frequently in both 
groups, once again Group N provided more examples. Collaboration in abstract 
thought did not occur at all in Group R but was beginning to show in a rudimentary 
and rather confused way among Group N. Clashes of opinion and arguments occurred 
with equal frequency among both groups but the nursery school children were more 
successful in reaching peaceful solutions or compromises. } 

(c) The incidence of phantasy and humour. Although a detailed consideration of 
these two aspects is beyond the scope of this study, some mention of them must be 
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made because of the marked difference between the two groups. Phantasy appeared 
in two forms: i. objects or situations were interpreted imaginatively ; for example, 
a boy playing in the sand pit said “ Have this ice cream, I made it for you”. ii. pure 
make-believe ; for example, a girl looking out of the window, called out “ there is a 
lion coming. Call a policeman”. Both these forms of phantasy occurred more 
frequently in the conversation of Group N (Table 14). Some of the children in 
this group had already reached the stage where imaginative play no longer depended 
on, or arose from, the stimulation of the material world. The difference in the 
use of phantasy showed itself not merely in the number of occasions it entered into 
conversations but members of Group N showed greater persistence in developing an 
imaginative idea and would return to it after an interruption. Among Group R this 
was not observed. 


TABLE 14 
Group N Group R 
Imaginative 
interpretation 23 ll 
Pure 
make-believe 16 2 
Total 39 13 


Phantasy expressed in speech. 


At this age, humour is, of course, very rudimentary and crude. Therefore all 
occasions on which speech gave rise to laughter or where it was used for the sheer 
fun of it, were counted as instances of verbal humour ; for example, a girl, pretending 
to be an old lady, said : “I hurt my bones” ; another girl : “ where are your bones ” 
(with both laughing) ; or a boy chanting amid the laughter of a small group: “I 
fell on a banana and a butterfly picked me up”. Though some of these instances 
could hardly be described as intrinsically humorous, they proved to be amusing to 
the children. Moreover, they represented an attempt at experimenting with words. 
It was perhaps the beginning of an awareness that speech can be manipulated so as 
to arouse amusement in others and that words, like toys, can be used as vehicles 
for imagination. This awareness had hardly dawned on Group R. Whereas 9 
children in Group N took an active part in altogether 11 humorous exchanges, 
there was one example only in Group R. 


SUMMARY AND CONCLUSIONS 


Wherever quantitative comparison was possible, Group N was found to be in 
advance of Group R. Differences ranged from 15 months on the Merrill-Palmer 
verbal items, 11 months on the Wechsler Vocabulary and 9 months on the Watts 
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English Language Scale to 5 months on the Terman Merrill Picture Vocabulary. 
In conversation, Group N used a wider vocabulary, showed rather better descriptive 
powers and a more mature type of sentence formation. On the other hand, there was 
little difference between the two groups in their use of the main grammatical 
components of speech ; many definitions were given in similar terms and both groups 
tended to find the same words difficult to explain (in the Terman Merrill and Wechsler 
vocabularies). Thus it seems that although Group R was retarded in the formal 
aspects of language, their speech was nevertheless developing along normal lines. 

Some suggestive qualitative differences were noted. Group N tended to express 
in their definitions as well as in their free play a greater degree of active participation 
and aggressive self-assertion. The theme of naughtiness and crying seemed to 
preoccupy some children in the residential nurseries. But perhaps most marked were 
verbally expressed differences in attitudes both to adults and to contemporaries. 
A craving for adult attention is commonly observed by visitors to residential nurseries 
and our finding confirmed this. Group R were more anxious to be with and speak 
to the investigator, whether other children were present or not. Adult attention was 
actively sought instead of accepted in a matter of fact way. At the same time, 
Group R children spoke less of themselves or their belongings. In contrast, the 
nursery school children apparently took for granted adult interest in their doings 
and made frequent reference to their own activities and possessions. Among the 
factors which may account for this difference, the following are likely to be important: 
in residential nurseries play materials tend to be communally owned which must 
inhibit or at least delay the development of a pride of possession engendered by 
having personally owned toys and belongings ; therefore one would not expect the 
same desire to talk about and show off valued possessions. Secondly, it is very difficult 
to provide for children in care the range of experiences which occur as part of normal 
family life such as shopping, tradesmen calling, watching mother cook and father 
shave, visits to and from relatives, etc. This means that there are fewer non-routine 
and thus exciting happenings to absorb and to recount in conversation. Thirdly, 
Nursery Nurses who are in closest contact with the children are very young and their 
training is strongly biased in favour of physical health, hygiene and habit training. 
Usually Matrons of Residential Nurseries are hospital trained nurses. It is the exception 
therefore to find emphasis placed on mental development in general and language 
growth in particular. 

Just as there was more co-operative play activity among Group N, so speech 
was used spontaneously by them to a greater extent in social contacts with contem- 
poraries. This was reflected in the rather lower coefficient of egocentrism and the higher 
proportion of remarks directed to other children. Again, Group N made greater 
use of verbally expressed phantasy and showed some persistence in following through 
an imaginative theme ; even when phantasy did appear in the conversation of Group R, 
it was of a more fleeting nature. Lastly, Group N was beginning to appreciate the 
possibility of manipulating words and phrases for their own amusement and that of 
others, while only one child in Group R used language for this purpose. 
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If it be accepted, that the medium of phantasy is one important way in which 
children learn to assimilate new experiences and come to terms with the difficulties 
and frustrations inseparable from growing up in a complex society, then children 
in care would seem to have a particularly great need in this respect. We cannot 
judge from our evidence whether Group R lacked to some extent the necessary 
verbal skills or whether their phantasies were too deeply repressed for overt expression 
to appear. Might the avoidance of the words “ mummy ” and “ home ” be an indication 
of the latter possibility ? Similarly, deprived children tend to blame their own 
naughty behaviour or “ badness ” as being the reason for having been sent away from 
home. One wonders whether the recurrence of themes of naughtiness and crying 
among Group R is linked to such phantasies. 

Although there was considerable overlap in the achievements of the two groups 
studied, our evidence cenfirms that there is some retardation in the language skills 
of pre-school children in care. In so far as Group R lacked the ability for verbalising 
phantasy and for using speech in making social relationships with contemporaries, 
to that extent their general emotional and social development may become adversely 
affected. Opinions may differ as to which should be regarded the cause and which 
the effect. But there is little doubt that such language difficulties as we found are 
likely to have long-term consequences unless remedied. For example readiness to 
commence school and ability to learn to read are the main educational tasks which 
will be adversely affected. 
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PERCEPTION OF COMPOUND CONSONANTS* 


J. M. Pickett * * 


Operational Applications Laboratory, Bolling Air Force Base 25, D.C. 


The perceptual confusions among English compounds of two consonants were 
examined. One defined class of syllables, made up of 15 initial compound consonants 
in conjunction with three vowel sounds /i/, /a/, and /o/, and another class of 15 
final compound consonants in conjunction with the same three vowel sounds, were 
employed, Recorded syllables were played back against a white noise background 
and against a low-frequency noise background. Confusion patterns among the 
compound consonants depended upon the articulatory dimensions of the individual 
consonant members which formed the compound. That is to say, there was little 
evidence of interaction among the members of the compound. The confusion patterns 
indicated that the low speech frequencies, i.e., those frequencies heard above the 
white noise, convey the consonant distinctions of nasal vs. glide vs. stop, and the 
distinctions among glides. On the other hand, the higher speech frequencies, heard 
above the low-frequency noise, conveyed the distinction of affrication and the place 
distinctions among nasals and stops. The different vowels had minor effects on the 
perception of the consonants adjacent to them, 


Compound consonant sounds are produced when a consonant articulation consists 
of two or more different component articulations. The component articulations are 
usually closely fused (Stetson, 1951). Such compounds are a prominent feature of 
heavily stressed languages, such as English and German. 


In English, the compound consonants are an appreciable factor in speech 
communication: in a sample of conversations, 15% of the syllables were either initiated 
or terminated by a compound consonant (French, et al 1930; Fletcher, 1953). 


* This is Technical Report 58-3 ASTIA Document No. 160710 of the Air Force Cambridge 
Research Center. This research is part of Project 7681, Auditory Presentation of Information, 
of the Human Engineering Program of the Air Research and Development Command. 

** Now at Naval Research Laboratory, Washington, D.C. 


1 The following terms will be used throughout the paper. A compound consonant as a whole 
will be referred to as a consonant or a compound. The “ individual consonants” which form 
a compound will be called components. These terms are suggested by Stetson to be consistent 
with the fact that the components of a compound, although they may have articulatory features 
like those of single consonants, are fused in articulation so as to perform but one function, 
namely, an accessory role in initiating or terminating a syllable (op. cit. pp. 83-88, 124-125). 




















289 


How are compound consonants perceived ? For practical purposes of language 
engineering, it would be valuable to know the intelligibilities and confusion tendencies 
of compound consonants in noise. Also, as in the case of single consonants, the 
confusion tendencies may be related to the articulatory features (dimensions) of the 
consonants (Miller and Nicely, 1955). 

The purpose of the present study was to examine the perception in noise of some 
of the compound consonants of spoken English. We follow the method used by 
Miller and Nicely (1955) for examining the perceived dimensions of the single 
consonants. Intelligibility tests of compound consonants were first carried out in 
noise under conditions designed roughly to mask certain articulatory dimensions. 
Stimulus-response matrices were then constructed to display the perceptual dimensions 
which were still heard over the noise. Our study was carried out with two-member 
compounds spoken in initial and in final positions in test syllables with the vowels 
/i/, /a/, and /o/ (spoken respectively as in the words speed, spa, and spoke). 

Two different noise spectra were employed in the intelligibility tests. The noise 
spectra were chosen with two aims in mind. First, to the extent that the acoustic 
cues for hearing consonant dimensions are restricted to certain frequency regions, 
different noise spectra for masking will allow us to discern different ferce, .ual 
dimensions in the responses of the listeners. Second, for language engineering purposes, 
we should like to have data under noise spectral conditions representing the extremes 
encountered in noisy locations where speech communication may be attempted. 


PROCEDURES 


In choosing the consonant compounds to be tested, an attempt was made to 
compromise among two requirements: (1) all of the most common articulatory features 
should be included, and (2) the most common compounds of spoken and written 
English should be included. The available counts of the frequency of occurrence 
of different compound consonants were consulted (French, et al, 1930 ; He-dan, 1956). 
Initial and final sets were chosen which embraced 90% of the occurrences of compound 
consonants. The sets chosen may be seen in Table 1 where the compound consonants 
are grouped according to their articulatory features. There were 15 different com- 
pounds in each set. 

Test lists of the consonant compounds were prepared and printed to serve as 
key lists for both talkers and listeners. Each list contained three random arrangements, 
one for each different vowel, of a double set of the initial compounds and of a 
double set of the final compounds. A test list thus consisted of six columns, each 
column containing a set of 30 initial or final compounds. Each column of 30 compounds 
was spoken with a single vowel. All three vowels occurred with each set of consonants 
on a list. The order of the three vowels was randomized in a list. The order of 
the initial and final sets alternated on a list, half the lists beginning with an initial 
set. Eight different lists were prepared. 

The consonant compounds were spoken in nonsense syllables of the form CCVb 
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TABLE 1 
Initial Compounds Final Compounds 
pr : rd 
tr Stop +r It r + Stop 
kr . 1 + Stop 
tw + Without | Without 
Stop + w ena ad 
kw Fricative nd ? : Fricative 
os n + Mid Stop 
kl Stop + 1 ng 
pl P nk n + Back Stop 
fl Fricative + nz] _ Nasal + 
d Po na mz { Voiced Fric 
rs ] : . 
‘ lide + F . 
st Fric + Stop With ‘oo ¢ _ With 
sk Fricative Fricative 
ks 
sw ts Stop + Fric 
sm Fric + Front | st 
‘Pp 


Consonant compounds tested. 


and bVCC. Four experienced male talkers were used. Each talker recorded two 
different six-column lists ;.each test syllable was spoken fluently in the carrier phrase 
“ You will try (test syllable).” A test syllable occurred about every 3 sec. The talkers 
carefully monitored their speech signals on a VU meter so as to provide a constant 
average reading on the carrier phrases.’ 

The test recordings were played back through earphones to a crew of four to 
six experienced listeners. Noise was mixed with the speech signals in order to 
produce confusions in hearing the compounds. The noise and speech signals were 
mixed electrically, amplified with negligible distortion, and then presented over 
the earphones (Permoflux PDR-8’s). The listeners wrote on their key lists the 
consonant compounds which they heard, restricting their responses to the particular 
sets of compounds used in the test lists. Each response was a compound consonant. 
The response was written beside the compound spoken, the latter being covered with 
an opaque mask. This arrangement provided a convenient form of the data for 
compiling the confusion matices. The listeners were asked to guess when uncertain 


2 The recordings were made in an anechoic room. The talkers’ average speech level was 
70 db re 0.0002 dynes per square centimeter at a distance of one meter from the lips. The 
speech was received by a microphone 12 in. from the lips. The intensity-frequency characteristic 
of the microphone-recorder system was essentially flat over the range 50-7000 cycles per sec. 
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and they responded to approximately 97% of the syllables presented. A series of 
preliminary runs under experimental conditions were carried out until the listeners 
were thoroughly familiar with the consonant sets. 

The effects of two random noise spectra were examined. The spectra, as measured 
at the input to the earphones, consisted of a “ flat” noise having equal intensity 
at all frequencies over its range, 100 to 6800 cycles, and a low-frequency noise having 
a spectrum slope of —12 db per octave over the same range. The average overall 
sound pressure level under the earphones was 86 db for the flat noise and 103 db 
for the low-frequency noise. 

One series of tests was carried out with each noise spectrum at a low speech-to-noise 
(S/N) ratio chosen to provide a moderately large number of confusions for analysis. 
The S/N ratios were measured with a VU meter. The low S/N ratio used with 
the flat noise was — 4 db ; with low-frequency noise it was - 30 db. In addition, a 
shorter series of tests was carried out with the flat noise at a S/N ratio of +6 db. 

The total number of responses to each different compound varied somewhat under 
the various conditions. The mean number of responses per compound, over the set 
of 15 compounds, is given in Table 2 for each condition of noise spectrum, consonant 
position, vowel, and S/N ratio. Each range about a mean was expressed as the 
percentage of its mean. The average percentage range over all conditions was 8.5%. 


TABLE 2 
Noise condition Vowel Initial Set Final Set 
i 200-2 213-1 
Flat Noise, low S/N a 222-9 221-1 
re) 202-7 215-3 
i 109-3 109-7 
Flat Noise, high S/N a 117-0 108-2 
° 115-4 113-9 
i 260-4 223-9 
Low-frequency noise a 237-7 232-9 
rs) 211-0 231-2 


Number of responses under the various experimental conditions. Each entry is the mean 
number of responses per member of a 15-member set of compound consonants, See text. 


After all the tests were completed, the listeners tallied the stimulus-response pairs 
in stimulus-response matrices. Each cell of a matrix received tallies recording the 
number of times a particular response compound had been heard when a particular 
stimulus compound had been spoken. For the two sets of compound consonants, a 
separate matrix was compiled for each combination of noise spectrum and vowel. 
Also, since the vowels did not have exceedingly strong effects on the patterns of 
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Tasie 3 
RESPONSE CONSONANT 
kw kl pl fl sil sn st sk 


tw sm 

pr 94 142 72 30 26 23 46 26 30 10 63 24 9 5 22 
38 31 21 25 20 34 11 96 134 S 
32 1 


kr 45 106 106 39 41 38 30 44 13 61 32 12 28 
tw 22 44 29 110 99 a 5 26 33 «(5 52 28 1 50 31 
kw 10 25 26 85 201 2 31 13 33 8 32 3 3 76 20 


kl xa 6 SS 36 42 88 81 56 59 10 49 20 13 9 26 
pl 23 34 23 48 30 65 142 S$ 7 23-22 12 19 22 


fl 28 32 14 29 26 46 94 186 74 18 35 ll 14 6 13 
sl i3 27 7 31 18 42 44 65 246 24 42 23 20 14 11 


sn 62 ll 11 8 17 11 36 238 a 63 6 24 
at 15 38 20 25 16 13 18 19 34 27 265 83 7 4 33 
sk 9 37 Il 7 17 25 8 27 32 172 176 14 18 42 


sm 5 17 10 14 17 10 14 22 55 116 27 «(34 256 15 14 
sw 14 26 18 57 66 28 33 a 677 «67 50 41 12 134 26 
sp 16 35 19 24 19 24 39 16 54 16 104 72 15 18 145 


Confusion Matrix for Initial Consonants Heard in Flat Noise, S/N = -—4 db. 


confusion of the compounds, four summary matrices were prepared by adding together 
the individual matrices for the three vowels. The summary matrices are presented 
in Tables 3-6. 

It was also desired to have a convenient way of portraying the shifts in confusion 
patterns which might occur with changes in noise spectrum and changes of vowel. 
For this purpose, the matrices were simplified by showing only the most frequent 
confusions. These were represented in each simplified matrix by entering dots in 
place of the confusion frequencies of the original matrix. The 21 most frequent 
confusions of the matrix were assigned a large dot; the 21 next most frequent 
confusions were assigned a medium-sized dot ; and the 21 next most frequent confusions 
were assigned a small dot. The simplified confusion matrices are presented in Figs. 1-4. 


MATRIX ARRANGEMENT BY CONSONANT DIMENSIONS 


A number of different arrangements were tried for the order of the compounds 
along the sides of the matrices. The final order used represents an attempt to produce 
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TABLE 4 


RESPONSE CONSONANT 


pr tr kr tw kw kl pl fl sl sn st sk sw sm sp 
pr 224 115 163 39 33 18 47 32 6 O y 2 6 1 8 
tr 70 425 42 68 19 13 22 15 4 3 ll 4 5 4 8 
kr 90 102 239 38 65 40 41 sos @ F 14 4 3 6 16 
tw 29 74 12 425 54 10 52 22 11 ~«#1 12. 7 8 3 
kw 32 36 55 90 288 41 44 38 «6 «65lll 10 9 34 3 «(ill 
kl 17 55 37 9 71 173 130 51 16 8 22 2 7 #1 13 
pl 30 48 30 79 26 87 224 100 18 5 26 #7 > 2. 2 
fl zw 38 22 38 «36 83 120 265 38 6 18 8 4 2 12 
sl & 8 2 4 4 a F 12 389 61 104 17 30 14 34 
sn > § 2 6 2 0 6 1 83 348 125 32 23 43 25 
st 7 ll 9 6 2 4 3 8 26 51 468 71 8 1 35 
sk 213 4 » g es 2 8 32 33 103 393 22 19 62 
sw 5 6 1 7 9 0 6 2 19 i9 15 36 452 33 101 
sm » 2 mw : @ 7 37 6 16 44 66 323 141 
sp >’ @ § ll 7 3 6 5 4 21 30 51 62 46 54 419 


Confusion Matrix for Initial Consonants Heard in Low-Frequency Noise, S/N = —30 db. 


a strong diagonal pattern of the confusions in low-frequency noise while still keeping 
the compounds grouped according to articulatory features. The primary distinction * 
among the initial compounds heard through the low-frequency noise was the presence 
or absence of a fricative component. The eight initial compounds containing a 
fricative component were grouped according to the articulation of the second 
component as follows: alveolar continuants, /l, n/, labials, /w, m, p/, and stops, 
/t, k/. Among the seven initial compounds not containing a fricative, the arrangement 
was according to the articulation of the glide component, /r, 1, w/. In the simplified 
confusion matrices, the grouping of the compounds is indicated by the heavy and 
medium lines between the rows and columns. 

A similar procedure was carried out for the compounds in final position. The 
fricative feature again appeared to be the distinction best heard in low-frequency 
noise. Among the eight compounds not having a fricative, the secondary distinction 


* The word distinction will be used generally to refer to the perceptual discrimination of one 
articulatory feature from others. Thus it also implies, as in general linguistic usage, the 
existence of articulatory distinctions. 
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TABLE 5 


RESPONSE CONSONANT 


rd rt Id it nd nt nk ng nz mz rs Is ks ts st 
rd 375 50 44 19 sy * F -s s 3 67 18 7-3 & 
rt 48 308 27 52 @ «mF 3 > 93 23 7 BD 
Id 62 15 307 79 16 4 18 14 8 4 20 90 10 5 12 
It 11 32 69 243 2 @ © - 2 32 127 33 17 20 
nd Pp + Ff 303 48 40 105 37 (42 4 3 o @ F 
nt 4 22 20 56 211 148 45 18 10 1: @ 28 29 37 
nk > &¢ @ 34 71 325 58 20 15 ,> 8 a2 
ng 2° &. a 6 67 19 56 400 39 28 j 2 . 7 @ 
nz 6 9 12 10 127 29 34 80 211 105 10 4 a a 
mz i 7H 5 %.21 33 72 153 196 5 13 12 6 
rs 89 111 39 33 Ss 2 6 5 6 tf 259 38 8 i 
Is 26 39 141 149 10 14 8 9 . 23 174 16 12 10 
ks 5 16 6 40 13 17 30 15 9 11 13 23 243 127 73 
ts 9 22 18 31 11 21 36 11 8 7 12 16 137 222 79 


st 17 28 24 37 18 30 45 19 6 4 18 28 94 90 202 


Confusion Matrix for Final Consonants Heard in Flat Noise, S/N = —4 db. 


of nasality was good. Among those with a nasal, the articulatory position of the main 
stop provided a third distinction. The remainder of the non-fricative final compounds, 
/rd, rt, ld, It/, were grouped according to the articulation of the glide component. 
Among the 7 final compounds having a fricative component, groups were formed 
by those having a nasal, /nz, mz/, those having a liquid, /rs, ls/, and those having 
a stop, /ks, ts, st/. 


PERCEPTUAL CONFUSIONS OF INITIAL COMPOUNDS 


Attention may first be given to the simplified summary matrices for the initial 
compounds, Fig. 1. First we will compare the confusion tendencies in flat noise at 
the low S/N ratio with those in the low-frequency noise. It will be noted that the 
presence of a fricative component was more easily distinguished in low-frequency 
noise than in flat noise. The hearing of affrication in a compound is shown by the 
extent of dispersion of the confusions among the four main quadrants of the matrices 
of Fig. 1. When the confusions are restricted largely to the two diagonal quadrants, 














STIMULUS CONSONANT 





¥. M. Pickett 295 


TABLE 6 


RESPONSE CONSONANT 


roti tk nd nt nk ng nz mz rs_ Is ks ts st 

rd 328 143 31 50 41 19 19 17 5 4 16 11 2 oO 10 
rt 45 427 14 81 19 47 12 7 3 4 11 10 6 1 10 
ld 50 59 195 129 80 44 42 29 3 .6C8 9 15 2 6 14 
Ir 29 92 59 251 36 88 29 16 4 9 15 19 7 3 24 
nd 38 31 66 32 308 68 31 56 ll 16 ll 7 2 4 10 
nt 14 58 19 78 49 320 45 36 14 15 10 8 °° &§ = 
nk 17 39 25 53 46 67 294 83 8 13 6 8 8 6 10 
ng 18 15 24 30 63 31 44 412 9 21 4 5 S 2 F 
nz 2 10 4 24 >zZiaias 335 133 1l 46 13 13 12 
mz 2 11 #10 16 ll 10 2 16 169 307 20 53 » 7 BD 
rs as & 662 3 13 16 490 43 38 10 37 
ls 8 17 9 24 So 2 & F 36 20 64 357 44 39 48 
ks 2. a a 3 1449 6 15 14 64 85 437 69 43 
ts 6 10 5 13 5 10 10 10 28 12 34 111 121 242 62 
st 5 146 7 19 7M 5 UF 4 4 16 25 17. 12 533 
Confusion Matrix for Final Consonants Heard in Low-Frequency Noise, S/N = —30db. 


as with the low-frequency noise, the presence of affrication is well heard. When 
the confusions are more dispersed over the entire matrix as with the flat noise, the 
affrication is poorly heard. The compound, /fl/, though it contains a fricative, 
was often, even in low-frequency noise, confused with the stop-liquid compounds. 
This exception is consistent with Fletcher’s finding (1953, p. 422) that hearing /f/ 
depends on hearing the middle speech frequencies, while hearing /s/ depends on 
hearing the high speech frequencies. In our own case, the middle speech frequencies 
are masked more by the low-frequency noise than are the high speech frequencies. 

Within the stop-glide group (the no-fricative compounds in the upper left quadrants 
of Fig. 1), the distinctions between /r/, /w/, and /l/ are heard somewhat in low- 
frequency noise, but much better in the flat noise. Within the group containing a 
fricative (lower right quadrants), the place distinctions (front-alveolar) are masked 
more by the flat noise than by the low-frequency noise. Particularly, the distinctions 
of /sm/ from /sn/ and /sw/ from /sl/, are drastically affected by the flat noise. 
On the other hand, the glide-nasal-stop distinctions, /sw/ from /sm/ from /sp/, and 
/sl/ from /sn/ from /st/, are disturbed by the low-frequency noise but nicely 
preserved in flat noise. When the non-fricative compounds were heard as having a 
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Fig. 1. Simplified patterns of confusion among compound consonants spoken in initial position 
and heard in two noise spectra. The more frequent confusions are represented by the 
heavier dots as shown by the key on the figure. Original data pooled from tests with 
three vowels, 
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fricative in the flat noise, those with an /r/ were not heard with an r-component, 
going largely to /st/, but the /w/ and /I1/ distinctions are still heard. 

We conclude from these observations on the confusions of initial compounds that 
the lower speech frequencies, those heard over the flat noise, convey the distinctions 
among /r, 1, w/ and the manner of articulation: i.e., whether a component is nasal, 
glide, or stop. The higher speech frequencies heard over the low-frequency noise 
convey the place of articulation of the nasal and stop components and the presence 
or absence of a fricative component. 

We may now compare, in Fig. 1, the confusion patterns in flat noise at the 
low S/N ratio with those at the high S/N ratio. The two patterns are highly 
similar, and thus we are reasonably confident that the conclusions reached by 
comparing noise spectra do not depend merely on the particular S/N ratios chosen for 
the flat noise. 

There appears to be one case in the data of strong interaction between a glide 
distinction and the other component of a compound. When /1/ and /w/ were paired 
with /s/, the flat noise caused a strong /sw-sl/ confusion, but relatively little confusion 
occurred between stop-w and stop-l. On the other hand, when the high speech 
frequencies were heard over the low-frequency noise, the /sw-sl/ distinction was 
good while the stop-w and stop-l distinction was poor. It would appear that the 
presence of a fricative before the /w/- and /1/-components causes an upward shift in 
the frequency region of their distinctive cues compared with the case where they 
occur after a stop. 

We now turn to the question of interaction between the confusion patterns and 
the three vowels used for the syllables, /i/, /a/, and /o/. The simplified matrices 
from the individual vowel tests, at S/N = -—4 db, appear in Fig. 2. Consider the 
initial compounds heard in flat noise. There are three indications of interaction 
between the vowels and the perception of the consonant components adjacent to them: 

(1) The /r, 1, w/ distinction is heard most poorly before /i/, and best before /a/. 

(2) The vowel /i/ causes the compound /st/ to be a popular response for nearly 
all the other compounds. 

(3) The vowel /a/ causes the compound /sl/ to be a popular response for all the 
other compounds. 

Effect (1), the poor /r, 1, w/ distinction before /i/, is apparently due to masking 
of the high third formant (F3) of /i/, thus interfering with the perception of a 
third formant transition which is necessary for distinguishing /r/ from /1/ (O’Connor, 
et al, 1957). /w/ is distinguished by a very low, short, steady state locus before 
the transition of the second formant. The F3 for /i/ is approximately at 3000 cycles 
for male talkers while that for /a/ is at about 2400 cycles (Peterson and Barney, 1952). 

Effects (2) and (3) may be related to the fact that the /t/ and /l/ components, 
when spoken respectively before the vowels /i/ and /a/, would produce a straight 
transition from consonant loci to the steady second formants of those vowels (O’Connor, 
et al, 1957 ; Liberman, et al, 1954 : Delattre. et al, 1955). In cases where the second 
formant transition is partially masked by the noise, the listener may assume that a 
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Fig. 2. Simplified patterns of confusion among initial compound consonants spoken with three 
vowels and heard in two noise spectra. The more frequent confusions are represented by 
the heavier dots as in the key of Fig. 1. 
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Fig. 3. Simplified patterns of confusion among compound consonants spoken in final position 
and heard in two noise spectra. The more frequent confusions are represented by the heavier 
dots as shown by the key on the figure. Original data pooled from tests with three vowels. 
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straight transition occurred. However, on this basis, /o/ might reasonably have led 
to more frequent responses of /sp/, since /p/ has a low locus which would lead 
directly into the low second formant of /o/. This effect did not occur in our data. 


PERCEPTUAL CONFUSIONS OF FINAL COMPOUNDS 


Our attention now turns to the final compounds. The simplified summary matrices 
appear in Fig. 3. We may first compare the confusion tendencies in flat and in 
low-frequency noise at the low S/N ratios. The confusions of the final compounds in 
low-frequency noise show, as with initial compounds, the clear distinction of the 
presence of a fricative component. This distinction is masked by the high noise- 
frequencies provided by the flat noise. Among the non-fricative compounds, the 
nasal-glide distinction is much better heard in flat noise than in low-frequency noise. 
The hearing of the nasal-glide distinction is indicated by the dispersion of confusions 
in the upper left quadrant of each matrix in Fig. 3. When the confusions in this 
quadrant are restricted largely to the diagonal sub-quadrants, the nasal-glide distinction 
is well heard. This pattern is seen with the flat noise. When the confusions are 
more dispersed over the main quadrant, as with the low-frequency noise, the nasal- 
glide distinction is poorly heard. The voicing of the stop in the glide-stop compounds 
/rd/, /rt/, /\d/, and /It/, is no better heard in flat noise than in low-frequency noise. 

Within the group containing a fricative, the flat noise masks the fricative component 
of /mz/ and /nz/ but it does not obliterate its voicing: /mz/ and /nz/ are inter- 
confused or confused mainly with /nd/ and /ng/. When the fricative is unvoiced, 
the lack of voicing is well heard in flat noise except for the strong confusions of 
of /rs-rd/ and /Is-ld/. The stop-fricative compounds /ks/, and /ts/, are interconfused 
in flat noise but in low-frequency noise they are confused with the glide-fricative 
compounds. The compound /st/ is remarkably well heard in low-frequency noise. 

It will be noted in Fig. 3 that the matrix for a high S/N ratio in flat noise shows 
very nearly the same patterns of confusion as does the matrix for the low S/N ratio. 
Therefore, our conclusions about the perceptual dimensions masked by the flat noise 
are not limited to the low S/N ratio chosen for the main series of tests. 

The individual simplified matrices for the three vowels are shown in Fig. 4. There 
appears to be only one case where the vowel preceding the final compound consonant 
has an effect on confusions of the compounds. This occurs in flat noise for the 
vowel /i/, where the compounds of a nasal and an unvoiced stop, /nt/ and /nk/, 
are more often confused with the stop-fricative compounds with this vowel than with 
the other two vowels, particularly /o/. There is also a slight tendency in flat noise for 
/o/ to cause the stop-fricative compounds to be heard as glide-stop compounds. 


RELATIVE INTELLIGIBILITIES OF COMPOUND CONSONANTS 


The data of Tables 3-6 provide a basis for comparing the intelligibilities of various 
groups of the compound consonants under the conditions of the experiment. The 
data were grouped in various ways and, for each group, intelligibility was calculated 
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Fig. 4. Simplified patterns of confusion among final compound consonants spoken with three 
vowels and heard in two noise spectra. The more frequent confusions are represented by 
the heavier dots as in the key of Fig. 3. 
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TABLE 7 


A. MAIN CONSONANT GROUPS 


Group Flat Noise Low-Frequency Noise 
S/N = —4db S/N = +6db 
All initials 27.5 45.6 475 
All finals 40.9 54.8 50.2 
Initials with fricative (8) 33.5 53.8 53.8 
Initials with no fricative (7) 20.8 36.0 40.3 
Finals of glide + stop (4) 47.6 68.4 43.5 
Finals of nasal + stop (4) 47.2 50.3 48.2 
Finals with fricative (7) 33.4 49.5 54.1 


B. SUB-GROUPS OF INITIAL CONSONANTS 


Group Flat Noise Low-Frequency Noise 
S/N=-—4db S/N = +6db 
Fricative + nasal (2) 42.6 64.7 47.0 
Fricative + 1 (2) 34.2 59.8 46.6 
sp and sk 25.9 43.9 56.8 
tr and tw 22.2 44.9 59.2 
pr, pl, kr and kl 17.3 28.7 30.5 


C. SUB-GROUPS OF FINAL CONSONANTS 


Group Flat Noise Low-Frequency Noise 
S/N = —4db S/N = +6db 
Nasal + velar stop (2) 55.1 54.7 51.4 
r + stop (2) 53.4 72.7 54.2 
1 + stop (2) 42.0 64.0 32.7 
r + fricative (1) 40.2 66.2 70.2 
Nasal + alveolar stop (2) 39.5 46.0 45.1 
Stop with fricative (3) 34.5 49.5 54.0 
1 + fricative (1) 27.0 44:5 51.7 
Nasal + fricative (2) 23.7 43.4 47.2 


Intelligibility in noise of various groups of compound consonants. Table entries are the 
percentage of compounds heard without any error. Numbers in parentheses give the number 
of different compounds in the group. 


as the percentage of responses showing no error. These intelligibilities are presented 
in Table 7. 


Intelligibilities in Flat Noise. In part A of Table 7, it will be noted that the final 
compounds were more intelligible in flat noise than the initial compounds. In initial 
position, the overall low intelligibility is due largely to the non-fricative compounds 
but in final position the non-fricative compounds account for the generally high 
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intelligibility of the finals. It will also be seen that compounds containing a fricative 
are relatively poorly heard in final position but relatively well heard in initial position. 
In part B of Table 7, we see that the initial fricative-nasal and fricative-l1 compounds 
are the highest in intelligibility while the initial stop-glide compounds are the 
lowest. In Part C, we see that the final nasal-stop and glide-stop compounds are 
highest in intelligibility in flat noise while the fricative-containing compounds are 
lowest. 

Intelligibilities in Low-frequency Noise. The advantage in flat noise for intelligibility 
of the final compounds (vs. initial), disappears when the noise spectrum is changed 
to the low-frequency noise (Table 7, A). In low-frequency noise, the initials with 
fricatives maintain approximately the same superiority relative to initials without 
fricatives as they did in the flat noise. However, the final compounds with fricatives 
are now, in low-frequency noise, relatively more intelligible than the other final 
compounds, whereas, in flat noise this relation was reversed. 

In Part B of Table 7, with the low-frequency noise, the compounds /tr/ and /tw/ 
rank very high in intelligibility relative to their position in flat noise. However, the 
other stop-glide compounds are still lowest in intelligibility. 

In Part C of Table 7, some of the final compounds with fricatives become relatively 
high in intelligibility when the noise spectrum is changed from flat to low-frequency 
noise. 


SUMMARY AND CONCLUSIONS 


Representative sets of 15 two-member compound consonants were spoken and 
recorded in test syllables in initial and final position and with front, middle, and 
back vowels. The syllables were then played back for intelligibility tests in two 
noise spectra, one which masked primarily the high speech frequencies and another 
which masked primarily the low speech frequencies. The results were analyzed by 
constructing stimulus-response matrices. The matrices exhibited various patterns 
of confusion in hearing the consonant compounds. Different confusion patterns for 
the various vowels and for the two masking conditions led to the following conclusions : 

(1) The perception of a compound consonant in noise is primarily determined by 
the articulatory dimensions of the components of the compound. 

(2) Different vowels do not strongly affect the perception of the component 
consonants with which they are spoken. 

(3) The high frequencies of speech (heard above the low-frequency noise) convey 
the presence of a fricative component in a compound consonant, and the place 
of articulation of nasal and stop components. 

(4) The low frequencies of speech convey the /r, 1, w/-distinction among the 
components and whether the manner of articulation of a component is nasal, glide, 
or stop. 

(5) When liquid and semivowel components, /l1/ and /w/, occur after initial 
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fricatives, their distinctive cues are apparently higher in frequency than when they 
occur after initial stops. , 

(6) For the particular sets used, the final compounds were more intelligible than 
the initial compounds in flat noise. In low-frequency noise, the initial compounds 
were nearly equal in intelligibility to the final compounds. 
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